Everyone and welcome to this Python and open CV course. In this course, we'll be talking about everything you need to know. To get started with open CV in Python, we're going to start off with the very basics that is reading images and video, manipulating those media files with image transformations, and how to draw shapes and put text on those files. Then we're going to move on to the most advanced parts of open CV that is switching between color spaces bitwise operators, masking, histograms, edge detection and thresholding. And finally, to sum things up, we'll be talking
about face detection and face recognition in open CV, so how to detect and find faces in an image and how to recognize them using inbuilt methods. In the last video, we'll be building a deep computer vision model to classify between the characters in The Simpsons based off some images. All material discussed will be available on my GitHub page, and all relevant links will be put up in the description below. If that sounds exciting, don't forget to head over and subscribe to my channel. And I'll see you in the course. Hey, everybody, and welcome to this
Python and urban TV coast. Over the next couple of videos, we're going to be talking about using the open CV library to perform all sorts of image and video related processing and manipulations. Now I won't be delving into what open CV is really is. But just be brief. It is a computer vision library that is available in Python, c++ and Java. A computer vision is an application of deep learning that primarily focuses on deriving insights from media files, that is images and video. Now, I'm going to assume that you already have Python installed on your
system. And a good way to check this is by going to terminal and typing Python dash dash version. Now make sure you're running a version of Python of at least 3.7 above whatever we do in this post wonderly work in some older versions of Python, and especially Python two, so just make sure that you have the latest version installed, go ahead to python.org and download the latest version from bet. Now assuming that you've done this, we can proceed to installing the packages that we require in this course. The first one is open C. So go
ahead and do a pip install Open CV dash contrib dash Python. Now sometimes you may find people telling you to install just open CV dash Python. Well, this open team dash Python is basically the main package the main module of open CV, open CV dash contract dash Python includes everything in the main module, as well as a contribution modules provided by the community. So this is something I recommend you install as it includes all of open CV functionality. You may also notice that urgency, we tried to install the NumPy package. Now NumPy is kind of
a scientific computing package in Python, that's extensively used in matrix an array manipulations, transformations, reshaping and things like that. Now, we'll be using NumPy in some of the videos in this course. But don't worry if you've never used them before. It's simple and relatively easy to get started with. Now the next package, I'd like you to install a sphere. So go ahead and do pip install seer. Now, slight disclaimer, this is a package that I built to basically help you to speed up your workflow. Sierra is basically a set of utility functions that will prove
super useful to you in your computer vision journey. It has a ton of super useful helper functions that will help speed up your workflow. Now, although we're not going to be using this for a good part of this course, in fact, we'll only begin to use this in our last video of this course when we're building a deep computer vision model. I recommend you install it now so that you don't have to worry about the installation process later on. If you're interested in contributing to this package, or just simply want to explore the codebase I'll
leave a link to this GitHub page in the description below. Okay, that's it for this video. In the next video, we'll be talking about how to read images and video in open CV. So I'll see you guys in the next video. Hey everybody, and welcome back to another video. In this video, we're going to be talking about how to read images and video in open CV. So I have a bunch of images in this photos folder, and a couple of videos in this videos folder. In the first half of this video, we'll be talking about
how to read in images in open CV, and towards the end we'll be actually talking about how to read in videos. So let's start off by creating a new file and call this reader dot p y. And the first thing we have to do is actually input CV two as CV. So the way we read in images in open CV is by making use of the cv.im read method. Now this method basically takes in a path to An image and returns that image as a matrix of pixels. Specifically, we're going to be trying to read
this image of a cat here. So we're going to say photos slash cat dot jpg. And we're going to capture this image in a variable called IMG. Now you can also provide absolute paths. But since this photos folder is inside my current working directory, I'm going to reference those images relatively. Now once we've read in our image, we can actually display this image by using the cv.rm show method. Now this method basically displays the image as a new window. So the two parameters we need to pass into this method is actually the name of the
window, in this case is going to be kept and the actual matrix of pixels to display, which in this case is IMG. And before we actually move ahead, I do want to add an additional line a CV dot wait key zero. Now the CV or wiki zero is basically a keyboard binding function, it waits for a specific delay, or time in milliseconds for a key to be pressed. So if you pass in zero, it basically waits for an infinite amount of time for a keyboard key to be pressed. I didn't worry too much about this,
it's not really that important for this course. But we will be discussing some parts of it towards the end of this video. So let's actually save this and run by saying Python, read dot p y, and the image is displayed in a new window. Cool. Now this was a small image, this was an image of size 640 by 427. Now we're going to try and read in this image of the same cat, but a much larger version, a 2400 by 1600 image. So we're gonna say Cat on a school large dot jpg. Let's save that
and run. And as you can see, this image goes way off screen. The reason for this is because the dimensions of this image were far greater than the dimensions of the monitor that I'm currently working on. Now currently, open CV does not have an inbuilt way of dealing with images that are far greater than your computer screen. There are ways to mitigate this issue. And we'll be discussing them in the next video when we talk about resizing and rescaling frames and images. But for now, just know that if you have images, if you have large
images, it's possibly going to go off screen. So that's it for reading images, we can then move on to reading videos in open CV. So that's called reading videos. So what we're going to do is we're actually going to read in this video of a dog, and the way we read in videos is by actually creating a capture variable and setting this equal to CV dot video capture. Now this method either takes an integer arguments like 0123, etc, or a path to a video file. Now you would provide an integer argument like 012, and three,
if you are using your webcam or a camera that is connected to your computer. In most cases, your webcam would be referenced by using the integer zero. But if you have multiple cameras connected to your computer, you could reference them by using the appropriate argument. For example, zero would reference your webcam, one would reference the first camera that is connected to your computer to would reference the second camera and so on. But in this video, we'll be actually looking at how to read an already existing videos from a file path. Now specifically, we'll be reading
this dog, this video for dog here. And the way we do that is by providing the path so videos, slash dog dot mp4. Now, here's where reading videos is kind of like different from reading images. In the case of reading and videos, we actually use a one loop and read the video frame by frame. So we're going to say while true. And the first thing we want to do inside this loop is say is true. And frame is equal to capture dot read. Now this capture dot read basically reads in this video frame by frame,
it returns the frame and a Boolean that says whether the frame was successfully read in or not. Do you display this video we can actually display an individual frame. So we do this by saying TV on show and we call this video and we pass in the frame and finally for some way to stop the Do from playing indefinitely is by saying if CV don't wait, Ki 20 and 0x ff is equal to equal to Ord of D. There we want to break out of this while loop. And once that's done, we can actually release
the capture pointer. And we can destroy all windows. And we can get rid of this. So basically just to recap, the capture variable is an instance of this video capture clause. Inside of while loop, we grab the video frame by frame. By utilizing the captured read method, we display each frame of the video by using the CV dot m show method. And finally, for some way to break out of this while loop, we say if See, we don't wait ki 20 if and 0x f f is equal to or D, which basically says that if
the letter D is pressed, then break out of this loop and stop displaying the video. And finally, we release the capture device and we destroy all the windows since we don't need them anymore. So let's save that and run. And we get a video displayed in a window like this. But once it's done, you will notice that the video suddenly stops and you get this error. More specifically a negative 215 assertion failed error. Now if you ever get an error like this negative 215 assertion failed. This would mean in almost all cases is that open
CV could not find a media file at that particular location that you specified. Now, the reason why it happened in the video is because the video ran out of frames, open CV could not find any more frames after the last frame in this video. So it unexpectedly broke out of the while loop by itself by raising a CV to error. And now you're gonna get the same error. If we comment this out, we uncomment this out. And we specify a wrong path to this image. So I see me Oh wait, wait key, zero, save that
and run and we get the exact same error. This basically again says that open CV could not find the image or the video frame at a particular location basically, it could not be ready. That's what it's saying. So that's pretty much it. For this video, we talked about how to read any images in open CV and how to read in videos using the video capture class. In the next video, we'll be talking about how to rescale and resize images and video frames in open CV. So see you then. Hey, everyone, and welcome back. In this
video, we're going to be talking about how to resize and rescale images and video frames in open CV. Now, we usually resize and rescale video files and images to prevent computational strain. Large media files tend to store a lot of information in it and displaying it takes up a lot of processing needs that your computer needs to assign. So by resizing and rescaling, we're actually trying to get rid of some of that information. rescaling video implies modifying its height and width to a particular height and width. Generally, it's always best practice to downscale or change
the width and height of your video files to a smaller value than the original dimensions. The reason for this is because while most cameras your webcam included, do not support going higher than its maximum capability. So for example, if a camera shoots in 720 P, chances are it's not going to be able to shoot in 1080 P or higher. So to rescale a video frame or an image, we can create a function called def rescale frame. And we can pass in the frame to be resized and scale the value which by default we're going to
set as point seven five. So what I'm going to do next is I'm going to say with is equal to frame dot shape of one of one times scale. And I'm going to copy this and do the same thing for the height. Now remember frame no shape of one is basically the width of your frame or your image and frame note shape of zero is basically the height of the image. Now since width and height are integers, I can actually convert these floating point values to an integer by converting it to an iron T. And
what we're going to be doing is we're going to create a variable called dimensions, and set this equal to a table of width, comma height. And we can actually return CV don't resize the frame, the dimensions, and we can pass in it interpolations of CV dot into on the school area. Now we'll be talking about CV dot resize in an upcoming video. But for now, just note that it resizes the frame to a particular dimension. So that's all a function does, it takes in the frame, and it scales that frame by a particular scalar value,
which by default is point seven, five. So let's actually try to see this in action. Let's go back to this readout p y, and grab this code. And we can paste there, we don't need us for now. uncomment these out. Now what I'm going to do is after I've read in the frame, I'm going to create a new frame call frame on this go resized, and set this equal to rescale frame of frame. And let's leave the scale value is point seven, five. And we can actually display this video resized by passing a frame on
the scope resized. Resize. So let's save that and run Python rescale del p why that was an error. Okay, we don't need this. Let's close that out, Save and Run. And this was our original video. And this is actually a resize the video with the video resize by point seven 570 5%. We can modify this by changing the scale of value to to maybe point two, so we rescaling to 20%. And we get an even smaller video in a new window. So let's close that out. Now you can also apply this on images. So let's
uncomment that out, change that to cat dot jpg. And we can do receive our show. Image and pawson the resized image. And we can create a resize image by calling rescale frame and we could pass in the IMG. So let's see that in Rome. And this is a small videos we're not concerned with that. This is actually the big image the large image. And this is the recent version of this image. So let's close that out. Now there is another way of rescaling or resizing video frames specifically. And that's actually using the capture dot set
method. Now this is specifically for videos, and will work for images. So let's go ahead and try to do that. Let's call this depth change rez. So we're changing we're changing the resolution of the image of video. And we can pass in a width and a height. And what we're going to do is we're going to say capture, don't set three comma with and we're going to do the same thing with capture dot set four comma height. Now three info basically stands for the properties of this capture class. So three references the width and full
references the height. You can also expand this to maybe change the brightness in the image. And I think you can reference that by setting this to 10. But for now we're going to be interested in the width and the height. Now, I do want to point out this, this method will work for images, videos, and live video. Basically, for everything you can use this rescale frame method. But the changes function only works for live video. That is video you read in from an external camera or your webcam for instance. So video that is going on
currently, this is not going to work on standalone video files, video files that already exist. It just doesn't work. So if you're trying to change the resolution of live video, then go with this function if you're trying to change the resolution of an old already existing video, then go with this function. So that's pretty much it for this video that we talked about how to resize and rescale video frames and images in open CV. In the next video, we'll be talking about how to draw shapes, and write text on an image. So that's everything. I'll
see you guys in the next video. Hey, everyone, and welcome back to another video. In this video, we're going to be talking about how to draw and write on images. So go ahead and create a new file and call this draw dot p y. We're going to input CV two and CV, we're going to input the NumPy package that open CV had installed previously. And we're going to input that as MP, we will read in an image by saying OMG is equal to cv.rm, read person photos, photos slash cat dot jpg, we can display that
image in a new window. And we can do receive out of weight key zero. Now there are two ways we can draw on images by actually drawing on standalone images like this image of a cat to or we can create a dummy image or a blank image to work with. And the way in which we can create a blank image is by saying blank is equal to NP dot zeros of shape 500 by 500. And give it a data type of ui 98. You ID eight is basically an image the datatype of an image. So
if you want to try and see this image, see what this image looks like. We can say blank, and we can pass in like save that and run Python drawed or p y. And this is basically the blank image that you can draw on. So we're going to be using that in instead of drawing this cat image. But feel free to use this cat image if you'd like. So the first thing we're going to do is try to paint is trying to paint the image a certain color. And the way we do this is by
saying blank and reference all the pixels and set this equal to zero comma 255 comma zero. So by painting the entire image green, and we can display this image by saying green in passing the blank image, save that and run. Can I broadcast Yeah, okay, you need to give it a shape of three, basically, we are giving the shape of height, width, and the number of color channels. So just keep that in mind save up. And this is the green image that we get cool, we can even change this and try to change this to
red zero comma 255. Save that. And we get a red image over here. Now you can also call a certain portion of the image by basically giving it a range of pixels. So we can say 200 to 300. And then from 300 to 400. Save that and run and you got a Red Square in this image. The next thing we're going to do is we're going to draw a rectangle. And the way we do this is by using the CV don't rectangle method. This method takes in an image to draw the rectangle over, which in
this case is blank. And it takes in point 1.2, color, thickness and a line type if you'd like. So the point one will specifically be zero comma zero, which is the origin. And we can go all the way across to 250 comma 250. Let's give it a color of zero comma 255 comma zero, which is green, give it a thickness of let's say two, which is basically saying the thickness of the borders. And once that's done, we can display this image by saying let's call this rectangle in passing and passing the blank image. We can
comment this out since we don't need this anymore. And we get a green rectangle that goes all the way from the origin to 250 comma 250. You can play around with it if you like so we can go from 250 to maybe 500. And it goes all the way across the image. So you basically divide the image in half. Now there is a way of filling in this image a certain color. And the way we do this is instead of saying thickness is equal to two, we say thickness is equal to CV dot field. That
basically fills in the rectangle to get this green rectangle. Now Alternatively, you can also specify this as negative one, negative one. And we get the same result, what we can also do is, instead of giving it fixed values like 250, and 500, what we could do is we could say, IMG done shape of zero, of one divided by divided by two, and image dome shape of zero, divided by divided by two. Let's save that and run. image is not in fact, God, this is blank, this is blank, save that and run. And we get a
nice little rectangle, or square, if you will, in this image, what it basically did is it scaled the rectangle from instead of being these, this entire square, this rectangle basically has dimensions, half of that of the original image. So moving on, let's try and draw a circle. Draw circle. This is also fairly straightforward, we do a CV dot circle. And we pass in the blank image. And we give it a center, which basically the coordinates of the center for now let's set this to the midpoint of this image by saying 250 comma 250. Alternatively, you
could also get this let's give it a radius of 40 pixels, give it a color of zero comma, zero comma 255, which is red BGR. And give it a thickness of let's say three. We can display this image, say, circle is equal to blank. And we get a nice little circle over here, that has its center at 250 km 250, and radius of 40 pixels. Again, you can also fill in this image by giving a thickness of negative one. Here, we get a nice little dot here in the middle. Cool. Now there's something else that
I forgot. And that is how to draw a line a standalone line on the image. That again, is fairly straightforward, say draw a line, we use a cv.in line method. And this takes in the image to draw the line on and two points, that's just copy these points, basically everything. And this basically draws a point from zero comma zero to half these image dimensions. So that's 252 50. And then it draws a line of color zero comma 255, comma zero. Let's set this to full white 2255, d 5255. And it's green thickness you can specify
as three. And we didn't display this image. See you don't on show colas line, rule the line, blank image, and we get a line that goes all across from zero comma, zero comma zero to 250, comma 250. Let's try and play around with this. And let's draw a line from 100 to maybe 250. And then it goes all the way to 300 to 400, save that. And you've got a line that goes from 100 100 to 300, comma 400. Cool. And finally, the last thing that we will discuss in this video is how to write
text on an image that that's right text on an image. Now, the way we do this is very straightforward. We see we do a CV dot put text. And this will put text on the blank image. We specify what we want to put on. So let's say hello. We can give it an origin, which is basically where do we want to draw the image from? Let's set this to 225 and 225. And we can also specify font face. Now open CV comes with inbuilt fonts. And we will be using the CV dot font unschool Hershey
ns go. We'll be using the triple x, you have complex you have duplex you have plain. You have script simplex and a lot of inbuilt phones. But for now, let's use a triplex. Let's give this a font scale, which is basically how much do you want to scale the font by, let's set this to 1.0. We don't want to scale a font, let's give it a color of zero comma 255, comma zero, and give it a thickness of two. Commit that out. And we can display this image. So you don't I'm show let's call this
text and pass in the blank image. And we get some text that is placed on the image. You play around with it and say, Hello, my name is Jason. Save and Run. And it goes off screen. I when we're dealing with large images, but we can there's no way of actually handling this except for maybe changing the margins here a bit, too, we can do that by saying let's say it's zero comma two to five. And it sounds from zero and says Hello, my name is yes. So that's it. For this video, we talked about
how to draw shapes, how to draw a lines, rectangles, circles and how to write text on an image. Now in the next video, we'll be talking about basic functions in open CV, that you're most likely going to come across whatever project in computer vision you end up doing. So if that's it, I'll see you guys in the next video. Hey, everyone, and welcome back to another video. In this video, we're going to be talking about the most basic functions in open CV that you're going to come across in whatever computer vision project you end up
building. So let's start off with the first function. And that is converting an image to grayscale. So we've written an image, and we've displayed that image in a new window. And currently, this is a BGR image, a three channel blue, green and red image. Now there are ways in open CV to essentially convert those BGR images to grayscale so that you only see the intensity distribution of pixels rather than the color itself. So the way we do that is by saying gray is equal to CV dot CBT color, we pass in the image that we
want to convert from, which is IMG, and we specify a color code. Now this kind of code is CV dealt kind of unskilled BGR. To great, since we're converting a BGR image to a grayscale image. And we can go ahead and display this image by saying CV don't show passing gray and pass in the gray image. Save that and run your Python basic.pi. And this was the original image. And this is the grayscale image. Let's try this with another image. Slide with no this is the image of a park in Boston save and maybe change
that to Boston. And this is the BGR image in open CV, and this is its corresponding grayscale image. So nothing too fancy. We've just converted from a BGR image to a grayscale image. The next function we're going to discuss is how to blur an image. Now blurring an image essentially removes some of the noise that exists in an image. For example, in an image, there may be some extra elements that were there because of bad lighting when the image was taken, or maybe some issues with the camera sensor and so on. And some of the
ways we can actually reduce this noise is by applying a slight blur. There are way too many blurring techniques which we will get into in the advanced part of this goes. But for now we're just going to use the Gaussian Blur. So what we're going to do is we're going to create a blurred image. I think blur is equal to CV dot Gaussian Blur. And this image will take an associate image which is the IMG it will take in a kernel size, which is actually a two by two tuple which is basically the window size
that open CV uses to compute the blown the image. We'll get into this in the advanced part of the scope so don't worry too much about this, just know that this kernel size has to be an odd number. So So let's start a real simple and keep the kernel size to three by three. And another thing that we have to specify is CV dot border on school default. So go ahead and try to display this image, the same blur, and pawson blue. Now, you will be able to notice some of the differences in this image.
And that is because of the blur that is applied on it. Right this people in the background are pretty clear on this image. And over here, they're slightly blurred. To increase a blind his image, we can essentially increase the kernel size from three by three to seven by seven, save that and run. And this is the image that is way more blurred than the previous image. So that's it. The next function we're going to discuss is how to create an edge cascade, which is basically trying to find the edges that are present in the image.
Now again, there are many edge cascades that are available. But for this video, we're going to be using the canny edge detector, which is pretty famous in the computer vision world. Essentially, it's a multi step process that involves a lot of blurring and then involves a lot of grading computations and stuff like that. So we're gonna say, Kenny, Kenny is equal to CV dot Kenny, we pass in the image, we pass in to threshold values, which for now I'm going to say 125 and 175. Let's go ahead and try to display this image, get the
Kenny images. And we can pass in county. Save that and run. And these were the edges that were found in this image. As you can see that it hardly any edges found in the sky. But a lot of features in the trees and the buildings. And quite a few, you know features and edges in the grass and stuff. We can reduce some of these edges by essentially blurring the image. And the way we do that is instead of passing the IMG, we pass in the blur. See that run. And as you can see that there
were far less edges that were found in the image. And this is a way you can basically reduce the amount of edges that were found by a lot by applying a lot of blur, or get rid of some of the edges by applying a slight blur. Now the next function we're going to discuss is how to dilate an image using a specific structuring element. Now the structuring element that we are going to use is actually these edges, the canny edges that were found, so we're gonna say dominating the image. And the way we do that
is by saying dilated is equal to CV dot dilate. And this will take in the structuring element, which is basically the canny edges. And we'll take a kernel size, which we'll specify as three by three for now. And it will also take n iterations of one. Now, dilation can be applied using several iterations of the time, but for now, we're just going to stick with one. So go ahead and try to display this image by saying CV dot m shope. Call this dilated. And we can pass in David. Save that and run. And if these
were, if these were edges, these are the dilated edges, we can maybe increase the kernel size to maybe seven by seven and tried to see what that does hold on. And nothing much was done. Not much difference was that let's try to increase the number of iterations to maybe three. And it's definitely way thicker. But you're gonna see subtle differences with the amount of features and edges that you find. Now there is a way of eroding this dilated image to get back this structuring element. Now, it's not going to be perfect, but it will work
in some cases. So we're gonna say, call this roading and we call this eroded is equal to CV don't erode, it will take in the dilated image, pass and dilated, it will take a kernel size of let's start off with three by three and given n iterations of one just for now. And we didn't display this image show coolest clothes eroded, eroded and if this was your structuring element, and this was your dilate image, this is basically the result you get from eroding this image. Now, it isn't the same as a structural element. But you
can just about to make the features that. But you can see that between this and this, there is a subtle change in the edges and the thickness of these edges, we can maybe try to match these values, so that we attempt so that there is an attempt to get back this edge cascade. And yes, we got the edges back there, as you can see that you compare these two, they look pretty much the same. And the edges are the same. So essentially, if you follow the same steps, you can, in most cases, get back the
same edge cascade. And probably the last function that we're going to discuss is how to resize and crop an image. So we're going to start with resize. So we come to resizing video frames and images in the previous video in one of the previous videos. But we're just going to touch on the CBO resize function just a bit. So we're going to say resized, resized equal to CV dot resize, this will take an image to be resized, and it will take in a destination size, which let's set this to 500 by 500. And so this
essentially takes in this image of the park, and resize that image to 500 by 500, ignoring the aspect ratio. So we display this image by saying saved out I'm sure resized and resized. Save that and run. And let's go back to this image. If this is the original image, this is the image that was resized to 500 by 500. Now by default, there is an interpolation that occurs in the background, and that is CV dot into on the scope area. Now this interpolation method is useful if you are shrinking the image to dimensions that are
smaller than that of the original dimensions. But in some cases, if you are trying to enlarge the image and scale the image to a much larger dimensions, you will probably use the inter underscore linear or the inter on scope cubic. Now cubic is the slowest among them all. But the resulting image that you get is of a much higher quality than the inter on scope area or the inter underscore linear. So let's touch on cropping. And that's basically by utilizing the fact that images are arrays. And we can employ something called Array Slicing, we can
select a portion of the image on the basis of your pixel values. So we can say cropped is equal to the image. And we can select a region from 50 to 200. And from 200 to 400. And we can display this image Cole is cropped, possibly cropped. And this is a cropped image of let's go back here of this original image, you try to superimpose them, it's probably going to be you. Yeah, it's basically this portion. So that's pretty much it. For this video, we talked about the most basic functions in open CV, we talked
about converting an image to grayscale by applying some blur by creating an edge cascade by dilating the image by eroding that dilated image by resizing an image and trying to crop an image using Array Slicing. In the next video, we're going to be talking about image transformations in open CV, that's translation, rotation, resizing, flipping and cropping, so if you have any questions, leave them in the comments below. Otherwise, I'll see you guys in the next video. Hey, everyone, and welcome back to this Python and open CV course. In this section, we're going to cover basic
image transformations. Now these are common techniques that you would likely apply to images, including translation, rotation, resizing, clipping and cropping. So let's start off with translation. Translation is basically shifting an image along the x and y axis. So using translation, you can shift an image up, down, left, right, or with any combination of the above. So so to translate an image, we can create a translating function, we're gonna call this def translate This translation function will take in an image to translate and take an x and y, x and y basically stands for the number
of pixels, you want to shift along the x axis and the y axis respectively. So do translate an image, we need to create a translation matrix. So we're going to call this transmit is equal to NP dot float 32. And this will take in a list with two lists inside of it. And the first list we're going to say, one comma zero comma x, and zero comma one comma y. And since we're using NumPy, we can import NumPy, import NumPy as NP. And once we've created our translation matrix, we can essentially get the dimensions of
the image saying dimensions, which is a tuple of image don't shave off one, which is the width an image dot shape of zero, which is the height. And we can return CV dot warp a fine. This will take in the image matrix to trans MIT animal taking the dimensions. And with that data, we can essentially translate our image. And before we do that, I do want to mention that if you have negative values for x, you're essentially translating the image to the left, negative negative y values implies shifting up positive x values implies shifting to
the right. And as you guessed, positive y values shifted down. So let's create our first translated image. We're setting this equal to translate, we're going to pass in the image, the image and we're going to shift the image right by 100 pixels, and down by 100 pixels. That's to receive it on on the show, translated and translate tip. Save that and run Python krones formations dot p y. And this is your translated image, it was shifted down by 100 pixels and shifted to the right by 100 pixels. So let's change that. Let's shift the image
left by 100 pixels and down by 100 pixels. So we pass in negative values for x and it moved to the left. Feel free to play around with these values as you see fit. Just know that negative x shifts to the left, negative y shoves it up, x shifted to the right and positive y values shifted down. Moving on, let's talk about rotation. rotation is exactly what it sounds like rotating an image by some angle. Open CV allows you to specify any point any rotation point that you'd like to rotate the image around. Usually if
the center but but with open CV, you could specify any arbitrary point it could be any corner, it could be 10 pixels to the right 40 pixels down, and you can shift the image around that point. So to draw to rotate the image, we can create a rotating function, let's call this dev rotate. This will take an image angle to rotate around and a rotation point which we're going to say which we're going to set is not so we're going to grab the height and width of the image by pressing by setting this equal to
IMG dot shape of the first two values. Basically, if the rotation point is none, we are going to assume that we want to rotate around the center. So we're going to say rot point is equal to width divided by two divided by two in height divided by divided by two. And we can essentially create the rotation matrix like we did with the translation matrix. By setting this equal to rot met is equal to CV dot get rotation matrix 2d. We're going to pass in the center the rotation point and angle to rotate around which is
angle and a scale value. Now we're not interested in scaling the image when we've rotated so we can set this to 1.0. value we can set a dimensions variable equal to the width and the height and we can return the rotated image which is a CV don't warp a fine image rot met the destination size which is dimensions. And that's it. That's all we need for this function. So we can create a rotated image by setting this equal to rotate, and we can rotate the original image by 45 degrees. So let's display this image, call
this rotated, and pass and rotated. Save that in rock. And this is your rotated image. As you can see, it was rotated counterclockwise by 45 degrees. If somehow you wanted to rotate this image clockwise, just specify negative values for this angle, and it will rotate the image around rotated clockwise. Now you can also rotate a rotated image that is take this image and rotated by 45 degrees further. So let's call this rotated, rotated rotated is equal to rotate or rotate tid. And we can rotate this image by another 45 degrees. So we're rotating it clockwise.
And we can see the.on show called is rotated, rotated. And we can pause and rotated, rotated, whatever, rotate it. And this is your rotate rotated image. Now the reason why these black lines were included is because if there's no image in it, if there's no part of the image in it, it's going to be black by default. So when you took this image and rotated it by 45 degrees, you essentially rotated the image, but introduce these black triangles. Now if you tried to rotate this image further by some angle, you are also trying to rotate
these black triangles along with it. So that's why you get these kind of a skewed image. So there's additional triangles are included over here. But save yourself the trouble and basically add up these angles and you will get the final angle. So we can change that to 90 and retake the original image by negative 90. And this is essentially the image that we were trying to go for, take this image rotated 45 degrees clockwise and rotate this 45 degrees image by further 45 degrees, save yourself the trouble and add those two angle values. So so
far, we've covered two image transformations, translation and rotation. Now we're going to explore how to resize an image. Now this is nothing too different from what we've discussed previously. But let's touch on adjust a bit resizing. And we can create a resized variable and set this equal to CV don't resize, we can pass in the image to resize and the destination signs of maybe 500 by 500. And by default the interpolation is CV dot inter underscore area. You can maybe change this to into underscore linear or inter underscore cubic. Definitely a matter of preference depending
on whether you're enlarging or shrinking the image. If you're shrinking the image, you will probably go for into underscore area or stick with default. If you're enlarging the image, you could probably use the inter underscore linear or the dansko cubic cubic is slower, but the resulting image is better with over high quality. Again, I think it's you different from what we discussed before. So we can display this image. I can resize and passing and resized. Save that run and we've got a resized image. Next up we have flipping how to flip an image. So we
don't need to define a function for this, we just need to create a variable and set this equal to CV dot flip. This will take in an image and a flipped code. Now this flip code could either be 01 or negative one. Zero basically implies flipping the image of vertically that is over the x axis one specifies that you want to flip the image horizontally or over the y axis and negative one basically implies flipping the image both vertically as well as horizontally. So let's start off with zero claiming it vertically. I'm show call this
flip in Parson boop, Save and Run. And this is the image that was clipped vertically. Let's try out a horizontal clip how we get a horizontal Flip, surely see whether it was a horizontal flip, we can bring these two images together. And if they looked like mirror images, then it was flipped horizontally. This is a kind of a symmetric image. So it's not that obvious, but bring them together and you can maybe find out the difference. We could also try to flip the image vertically and horizontally by specifying negative one as a flip code. And
the image was flipped both vertically, as well as horizontally mirror images, but reverse mirror images. And the last method is cropping now being discussed cropping again, I'm just going to touch on it, we can create a variable called corrupt and set this equal to IMG and perform some Array Slicing. So 200 to 403 100 to 400. Save that and run. We didn't display the search. Even though I'm show it's cool as cropped, past and cropped, Save and Run. And this is the cropped image we try to bring this together can be brought together, cutting gram
holders. Okay. So that's pretty much it. For this video, we talked about translating an image, rotating that image, resizing an image, flipping an image and cropping those images, we are basically just covering the basics, basic image transformations. There are of course, way mo transformation that you could possibly do with open CV. But just to keep this go simple and beginner friendly, I'm only covering the basic transformations. So that's it for this video. In the next video, we're going to be talking about how to identify countries in an image. So if you have any questions, leave
them in the comments below. Otherwise, I'll see you guys in the next video. Hey everyone, and welcome back to another video. In this video, we're going to be talking about how to identify contours in open CV. Now contours are basically the boundaries of objects, the line or curve that joins the continuous points along the boundary of an object. Now from a mathematical point of view, they're not the same as edges. For the most part, you can get away with thinking of contours as edges. But from a mathematical point of view, contours and edges are two
different things. contours are useful tools when you get into shape analysis and object detection and recognition. So in this video, I sort of want to introduce you to the idea of contours and how to identify them in open CV. So the first thing I've done is I've read in a file, an image file, and I've displayed that image using the cv.rm show method. Then next thing I want to do is convert this image to grayscale by saying gray is equal to CV dot CVT color IMG CV dot color on this go BGR to great, and
we can display this. So just know that we're on the same footing. I'm going to run this Python, Cantu's down p y. And we get a gray image over here. Now after this, I want to essentially grab the edges of the image using the canny edge detector. So I'm going to say Kenny is equal to CV Kenny, we're going to pass in the IMG and we're going to give it to threshold values. So 125 and 175. And we can display this image calling this Kenny edges passing Kenny. I save that and run it I didn't
save it, save it in ROM and these are the edges that were there in the image. Now, the way we find the contours of this image is by using the find contours method. Now this method basically returns two things, contours and higher keys. And essentially this is equal to CV dot find Cantu's. This takes in the edges. So Kenny, it takes in a mod in which to find the contents now this is either CV dot retter on a scope tree, if you want all the hierarchical contours, or the rhetoric external if you want only the
external countries, or, or retter list if you want all the cartoons in the image. The next method we pass in is actually the cone to approximation method for now we're going to set this to CV dot chain, unscrew approx ns go numb. So let's, let's just have a top down look at what this function does. So essentially, the CBO fund contours method looks at the structuring element or the edges of a found in the image and returns to values, the contours, which is essentially a Python list of all the coordinates of the contours that were
found in the image. And hierarchies, which is really out of the scope of this course. But essentially, it refers to the hierarchical representation of contours. So for example, if you have a rectangle, and inside the rectangle, if you have a square, and inside of that square, you have a circle. So this hierarchy is essentially the representation that open CV uses to find these courtrooms. This even retinal list essentially is a mod in which this fine contries method returns and finds the cuantos. Read a list essentially returns all the quantities that find in the image. We
also have Reto external that we discussed radix download retrieves only the external conduits to all the ones on the outside, it returns those revenue underscore tree returns all the hierarchical contours, all the contours that are in a hierarchical system that is returned by record underscore tree. For now, I'm just going to set this to will list to return all the contours in the image. The next one we have is the contour approximation method. This is basically how we want to approximate the contour. So chain approx none does nothing, it just returns all of the contracts.
Some people prefer to use red chain approx symbol, which essentially compresses all the quantities that are returned in the simple ones that make most sense. So for example, if you have a line in an image, if you use chain approx none, you are essentially going to get all the contours all the coordinates of the points of that line, chain approx simple essentially takes all of those points of that line, compresses it into the two end points only. Because that makes the most sense, a line is defined by only two end points, we don't want all
the points in between. That, in a nutshell is what this entire function is doing. So since cartoons is a list, we can essentially find the number of cartoons that were found by finding the length of this list. So we can print print length of this list. And we can say fair, we can say we can say these many contused. Found. Okay, so let's say that and Ron. And we found 2794 quantos in the image. And this is huge. This is a lot of code who's ever found in the image. So let's do a couple of
things. Let's try to change this chain approx symbol to chain approx none, and see what that does. See how that affects our length. Now there isn't any difference between those two, because I'm guessing that there were no points to compress and sin there are a lot of edges and points in this image. So there wasn't a lot of compression. So let's change the back to symbol. And actually, what we want to do is I want to blow this image before I find the edges. So let's do this. Let's do a blue is equal to CV
dot Gaussian Blur can pass in the gray image. And we can give the kernel size of let's let's do a lot of blur. So five by five. And maybe we can give it by the default of CV dot border on disko default. And we can if you want to, and we can display this image, call this blur and pass an error we can find the edges on this blurred image. So let's close below. And as you can see this significant reduction in the number of Quorn twos that were found just by blurring the image. So
it went all the way from 2794 to 380. That's closest seven times just by blurring the image with the kernel size of five by five. Okay, now there is another way of finding the corner shoes is that it's stead of using this canny edge detector, we can use another function in open CV, and that is threshold. So I'm just going to comment this out. And down here, what I'm going to do is I'm going to say, ret Thresh is equal to CV don't threshold, this will take in the gray image, and we've taken a threshold
value of 125 and a maximum value of 255. I don't worry too much about thresholding. For now, just know that threshold essentially looks at an image and tries to binarize that image. So if a particular pixel is below 125, if the density of that pixel is below 125, it's going to be set to zero or blank. If it is above 125, it is set to white or two by five. That's all it does. And in the find quantities method, we can essentially pass in the thrush value. So let's save that. Let's close this out and
try to run that. Type. Okay. threshold missing. Okay, I think I forgot one part, where to specify a threshold and type. So this is CV dot Thresh. On this go, binary, binary raising the image basically. Okay, let's run that. And there were 839 contours that were found, we can visualize that let's print ad to display this Thresh. Image, passing Thresh. Same that run. And this was the thresholded image you're using 125. close this out, using 125 as our threshold value, and 255 as a maximum value, we got this thresholded image. And when we tried to
find the current use on this image, we got 839 concepts. Now don't worry too much about this thresholding business, we'll discuss this in the advanced section of this goes more in depth just know that thresholding attempts to binarize an image, take an image and convert it into binary form that is either zero or black, or white, or to Vi five. Now what's cool in open CV is that you can actually visualize the contours that were found on the image by essentially drawing over the image. So what do we do real quick is actually input NumPy
NumPy as NP and after this, I'm going to create a blank variable and set this equal to NP dot zeros of image dot shape of the first two values, and maybe give it a data type of I know you are 28 we can display this image because blank pawsome blank, just to visualize and have a blank image to work with. Let's save that and go to a blank image. This is of the same dimensions as our original accounts image. So what I'm going to do is I'm going to draw these contours on that blank image
so that we know what kind of contours that open CV found. So the way we do that is by using the CV dot draw contours method, it takes in an image to draw over fill blank, it takes in the contours, which has to be a list, which in this case is just the quantities list. It takes an account to index which are basically how many countries do you want in the image. Since we want all of them since we want to draw all of them, we can specify a negative one, give it a color, let's
add this to BGR. So let's set this to red zero comma zero comma 255. And we can give it a thickness of maybe two. And we can display the blank image. So let's call this contused join. And we can pass in blank. Save that and run. Okay, there was an error I think this has to be shaped. Okay, so these were the cartoons that would draw on the image. If you take a look at the threshold value thresholded image, it's not the same thing. What I believe it attempted to do is instead it found the
edges of this image all the edges of this image and attempted to draw it out on this blank image. Let's set this so let's set the thickness to maybe one so that we have a crisper view Okay, so these were the quantities that were drawn in the image. And in fact, if you try to visualize it with Kenny, let's actually visualize that with Kenny uncomment. That out, run. blows on the point undefined. Okay, that has to be an image. Okay, let's look at Kenny, let's look at this. Okay, it's not the same thing. And that
makes sense, because our firing coaches method and use Kenny, as the basis of detecting and finding the controls. But we can do that. Let's not use a thresholding method. And instead, let's use Kenny. So we can pass in Kenny here. Save that and run. And, okay, that pretty much the same thing, right? It's basically a mirror image of these two, like I said, you can get away with thinking of contours as edges. They're not the same thing. But, but you can think of them as edges. Because from a programming point of view, they kind of
like the edges of the image. Right? The other boundaries, they are curves that join the points along the boundary, those are basically edges. So let's try to blow that image. Let's uncomment that out. Let's see what that does. I don't think that had any effect because we didn't pass in blood. Okay, 380 countries have found and mirror images of each other. So generally, what I recommend is that you use scanning method first, and then try to find the corn who's using that, rather than try to threshold the image and then find the contours on that.
Because like we will discuss in the advanced section, this type of thresholding. The simple thresholding has its disadvantages. Maybe because we're passing in a simple, just one value, dread binarize the image using this threshold value, right? It's not the most ideal, but in some cases, in most cases, it is most favored kind of thresholding because it's the simplest, and it does the job pretty well. So that's pretty much it. For this video, we talked about how to identify quantities in open CV. But in two methods first trying to find the edge cascades of the image
using the canny edge detector, and try to find the quantities using that and also trying to binarize that image using the CV dot threshold and finding the contours on that. So if you have any questions, leave them in the comments below. I'll be sure to check them out. Otherwise, as always, I'll see you guys in the next video. Hey, everyone, and welcome back to another video. We are now at the advanced section of this course, where we are going to discuss the advanced concepts in open CV. So what we're going to be doing in this
video is actually discussing how to switch between color spaces in urgency. Our color spaces, basically a space of colors, a system of representing an array of pixel colors. RGB is a kind of space grayscale is color space. We also have other color spaces like HSV, lamb, and many more. So let's start off with trying to convert this image to grayscale. So we're going to convert from a BGR image which is open CV is default way of reading and images. And we're going to convert that to grayscale. So the way we do that is by saying
gray is equal to CV dot CBT color. We pass in the image and we specify a color code, which is CV dot color, underscore BGR to to grip since we're converting from a BGR image format to grayscale format, and we can display this image I st gray and passing in grip. Let's save that and run Python spaces dot p y. We had a problem as a comma, Save and Run. And this is the grayscale version of this BGR image. Cool pretty cool. grayscale images basically show you the distribution of pixel intensities at particular locations of
your image. So let's start off with trying to convert this image to an HSV format. So from Jeff from vgr to HSV. HSV is also called hue saturation value and is kind of based on how humans think and conceive of color. So the way we conduct that is by saying HSV is equal to CV dot CBT color, we pass in the IMG variable. And we specify a color code, which is CV dot color, undergo BGR to HSV. And we can display the syringe called as HSV and pass in HSV. Let's save that. And this is
the HSE version of this BGR image. As you can see that there was a lot of green in this era and the skies are reddish. Now we also have another kind of color space. And that is called the LA be color space. So we're going to convert from BGR to L A, B. This is sometimes represented as L times A times B, but but v free to use whatever you want. So lb is equal to CV dot CVT color, we pass the MG and the color on the scope of BGR. to AB see that I'm
sure colas lamb pass and lamb is wrong that and this is the LGB version of this BGR image. This kind of looks like a washed down version of this BGR image. But hey, that's the lamb format is more tuned to how humans perceive color. Now when I started off with this goes, I mentioned that open CV reads in images in a BGR format that has blue, green and red. And that's not the current system that we use to represent colors outside of open CV. Outside of open CV, we use the RGB format, which is kind
of like the inverse of the BGR format. Now if you try to display this IMG image in a Python library that's not open CV, you're probably going to see an inversion of colors. And we can do there real quick. Let's try to input mat plot lib dot pie plot as PLT. And we can can, we can basically uncomment commented that out. And we can try and display this image variable. So we're gonna say PLT dot, I am show pass in the image. And we could say a peak, or we could say PLT dot show, maybe
let's comment this out, save that and run. And this is the image you get. Now, if you compare with the image that open CV read, this is completely different, these two are completely different images. And the reason for this is because this image is a BGR image and open CV displays BGR images. But now if you tried to take this BGR image and try to display it in matplotlib, for instance, matplotlib has no idea that this image is a BGR image and displays that image as if it were an RGB image. So that's why you
see an inversion of color. So where there's red over here, you see a blue, where there's blue over here you see a red, and there are ways to convert this from BGR to RGB. And that is by using open CV itself. So let's comment that out. And let's uncomment this all out. And right over here, let's say BGR to RGB. And what we're going to say is our RGB is equal to CV dot CVT color, we can pass in the BGR image oopsie, we can pass in the br image. And what we're going to do
is specify a color code, which you see without color on the scope BGR to RGB. And we can try to display this image in in open CV and see what that displays RGB. And we can also display this in matplotlib. So I've passed in the RGB. And we can do PLT dot show, save that and go here it is you Python spaces dot p y. What I'm most interested in is this. And this. Now again, you see an inversion of colors, but this time in open CV because now you provided open CV and RGB image.
And it assumed it was a BGR image. And that's why there's an inversion of colors. But we pass in the RGB image to matplotlib and matplotlib is default is RGB. So that's why I displayed the proper image. So just keep this in mind when you're working with multiple libraries, including open CV and matplotlib for instance, because do keep in mind the inversion of colors that tends to take place between these two libraries. So now another thing that I want to do is we've essentially converted the BGR to grayscale, we've essentially converted BGR, HSV BGR to
RGB BGR to RGB, what we can do is we can do the inverse of that, we can convert a grayscale image to BGR, we can convert an HSV to BGR, we can convert an LNB to BGR, and RGB to be GL, and so on. But here's one of the downsides. You cannot convert grayscale image to HSV directly. If you wanted to do that, what do you have to do is convert the grayscale to BGR. And then from video to HSV. So we're gonna do that real quick. So we're gonna say HSV, two BGR. Okay, so
the first thing we do is HSV, underscore vgr. Basically, converting from HSV to BGR is equal to CV dot CVT color, this will take in the HSV image. And the color code will be color on Cisco HSV, two BGR. And we can display this image, let's call this HSV, two BGR and pass in HD on the scope BGR. On screw VR, save that and run. Okay, we're not interested in this. So let's close this out. But essentially, this is the HSV, two BGR image. If this was the HV image, we converted this image to BGR.
And we can try this with lamb. So let's call this lamb to lamb, and of course, lamb. And let's copy this and paste that. We can get rid of Mapplethorpe's it's been addressed in an email. So go out and run. Okay, that was a mistake. We said HSV, lamb to L baby to BGR. That was my mistake. Cool. So if this was the lamb version, this is the lamb to BGR version back from BGR to lamb and from lamb to BGR. So that's pretty much it. For this video, we discussed how to convert, we discussed
how to convert between color spaces from BGR to grayscale, HSV, LGB, and RGB. And if you want to convert from grayscale to nav, for instance, note that there is no direct method, what you could do is convert that grayscale to BGR. And then from BGR to and maybe that's possible. By directly. I don't think there was a way to do that, if open CV could come up with the feature like that, it would be good, but it's not gonna hurt you to write extra lines of code, at least two or three lines of code extra,
moderately hard. In the next video, we will be talking about how to split and merge color channels in open CV. If you have any questions, leave them in the comments below. Otherwise, I'll see you guys in the next video. Everyone and welcome back to another video. In this video, we're going to be talking about how to split and merge color channels in open CV. Now, a color image basically consists of multiple channels, red, green, and blue. All the images you see around you all the BGR or the RGB images are basically these three color channels
merged together. Now open CV allows you to split an image into its respective color channels. So you can take a BGR image and split it into blue, green and red components. So that's what we're going to be doing in this video, we're going to be taking this image of the park that we had seen in previous videos, and we're going to split that into its three color channels. So the way we do that is by saying b comma g comma r, which stands for the respective color channels, and set this equal to CV dot split
split of the image. So the CV dot split basically split the image into blue, green and red. And we can display this image by saying CV dot I'm sure, let's call this blue and pass in blue. And let's do the same for green image and pass in G and we can do the same for the red part two are and we can actually visualize the shape the shapes of these images. So let's first print the image node shape, and then print the bead on shape. And then print the genome shape and then print the our
dot shape. Basically, we're printing the shapes and dimensions of the image and the blue, green and red and we're also displaying these images. So let's run Python split merge dot p Why. And these are the images that you get back. This is the blues, the blue image, this is the green image. And this is the red image. Now these are depicted and displayed as grayscale images that show the distribution of pixel intensities. regions where it's lighter showed that there is a far more concentration of those pixel values and regions where it's darker, represented a little
or even no pixels in that region. So take a look at the blue pick the blue channel first. And if you can, if you compared with the original image, you will see that the sky is kind of almost white, this basically shows you that there is a high concentration of blue in the sky, and not so much in the the trees or the grass, let's take a look at the green. And there is a fairly even distribution of pixel intensities between the between the grass, the trees, and some parts of the sky. And take a
look at the red color channel. And you can see that parts of the trees that are red are whiter and the grass in the sky are not that white in this red image. So this means that there is not much red color in those regions. Now coming back, let's take a look at the shapes of the image. Now this stands for the original image, the BGR image, the additional elements in the tuple here represents the number of color channels, three represents three color channels blue, green, and red. Now if we proceeded to display the shapes
of BG and our components, we don't see a three in the tuple. That's because the shape of that component is one. It's not mentioned here, but it is one. That's why when you try to display this image using see even if I'm show it displays it as a grayscale image, because grayscale images have a shape of one. Now, let's try and merge these color channels together. So the way we do that is by seeing the merge image, merged images equal to CV dot merge. And what we do is we pass in a list of blue
of blue comma g comma r, I'd save that in let's display that things either on show call this them call this the merged image. And we can pass in merged. So let's save that and run. And we get back the merged image by basically merging the three individual color channels red, green, and blue. Now there is a way an additional way of looking at the actual color there is in that channel. So instead of showing you grayscale images, it shows you the actual color involved. So for the blue image, you get the blue color channel
for the red channel, you get the red color for that channel. And the way we do that is we actually have to reconstruct the image. The shapes of these images are basically grayscale images. But what we can do is we can actually create a blank image, a blank image using NumPy. And essentially, what we're going to do is we're going to say blank is equal to NP dot zeroes. And we're going to set this to the shape of the image, but only the first two values. And we can give it a data type of you
iemt, eight, eight, which basically are for images. And to print the blue color channel, what we're going to do is we're going to say, down here, we're going to say blue is equal to CV dot image, we're going to pass in the list of b comma, blank comma blink. And we're going to do the same thing for green and set is equal to CV dot merge of blank comma g comma blank. And we're going to do the same thing for red by setting this equal to CV dot merge of blank comma blink, comma, comma red.
Basically, what I've done is this blank image basically consists of the height and the width, not necessarily number of color channels in the image. So by essentially merging the blue image in its respective compartment, so blue, green and red, we are setting the green and the red components to black and only displaying the blue channel. And we're doing the same thing for the green by setting the blue and the red components to black. And the same thing for red by setting the blue and the green components to black. And we can display this by saying
blue, green, and red. Let's save that and run and now you actually get the color in its respective color channels. Take a look at this, you now be able to visualize the distribution much better. Here you can see lineup later portions represent a high distribution. Lighter portions here represent the high distribution of red and higher and wider regions represent a high distribution of green. So essentially, if you take these three images of these color towns and merging them together, you essentially get back the merged image. That's the merged image. So that's pretty much it. For
this video, we discuss how to split an image into three respective color channels, how to reconstruct the image to display the actual color involved in that channel, and how to merge those color channels back into its original image. In the next video, we'll be talking about how to smooth and blur an image using various blurring techniques. If you have any questions, leave them in the comments below. Otherwise, I'll see you guys in the next video. Hey, everyone, and welcome back to another video. In this video, we're gonna address the concepts of smoothing and blurring in
urban CV. Now, before I mentioned that we generally smooth and image when it tends to have a lot of noise, and noise that's caused from camera sensors are basically problems in lighting when the image was taken. And we can essentially smooth out the image or reduce some of the noise by applying some blurring method. Now Previously, we discussed the Gaussian Blur method, which is kind of one of the most popular methods in blurring. But generally, you're going to see that Gaussian Blur won't really suit some of your purposes. And that's why there are many blurring
techniques that we have. And that's what we're going to address in this video. Now, before we actually do that, I do want to address a couple of concepts. Well, let's actually go to an image and discuss what exactly goes on when you try to apply blur. So essentially, the first thing that we need to define is something called a kernel or window. And that is essentially this window that you draw over an image that has two lines here. Let's draw another line. So this is essentially a window that you draw over a specific portion of
an image. And something happens on the pixels in this window. Let's change it to blue. Yeah. So essentially, this window has a size, this size is called a kernel size. Now kernel size is basically the number of rows and the number of columns. So over here, we have three columns and three rows. So the kernel size for this is three by three. Now, essentially, what happens here is that we have multiple methods to apply some blue. So essentially, blur is applied to the middle pixel as a result of the pixels around it, also called the
surrounding pixels. Let's change that to a different color. So something happens here as a result of the pixels around the surrounding pixels. So with that in mind, let's go back and discuss the first method of blurring which is averaging. So essentially, averaging is we define a kernel window over a specific portion of an image, this window will essentially compute the pixel intensity of the middle pixel of the true center as the average of the surrounding pixel intensities. So if this was to green, suppose if this pixel intensity was one, this was maybe two, this is
345678, you get the point. Essentially, the new pixel intensity for this region will be the average of all the surrounding pixel intensity. So that's summing up one plus two plus three plus four plus five plus six plus seven plus eight, and dividing that by eight, which is essentially the number of surrounding pixels. And we essentially use that result as the pixel intensity for the middle value, or the true center. And this process happens throughout the image. So this window basically slides to the right. And once that's done, it slides down, and computed basically for all
the pixels in the image. So let's try to apply and see what this does. So what we're going to do is we're going to say average, is equal to CV don't blur. The CV or blow method is a method in which we can apply averaging blur. So we define the source image which is IMG, we give it a kernel size of let's say three by three. And that's it. We can display this image called as average, average blur. Save that and run Python smoothing dot p y net Gosh, we have to pass an average, save
that and run. And this is basically the average blow that's applied. So what the algorithm did in the background was essentially define a candle window of a specified size three by three. And it computed the center value for a pixel using the average of all the surrounding pixel intensities. And the result of that is we get a blurred image. So the higher kernel size we specified, the more blur there is going to be in the image. So let's increase that to seven by seven and see what that does. And we get an image with way
more blur. So let's move on to the next method, which is the Gaussian Blur. So Gaussian basically does the same thing as averaging, except that instead of computing the average of all of this running pixel intensity, each running pixel is given a particular weight. And essentially, the average of the products of those weights gives you the value for the true center. Now using this method, you tend to get less blurring than compared to the averaging method. But the Gaussian Blur is more natural as compared to averaging. So let's print that out. Let's call this Yes.
And set this equal to CV dot Gaussian Blur. And this will take in the source image, so IMG kernel size of seven by seven, just to compare with the averaging. And another parameter that we need to specify is sigma x, or basically the standard deviation in the x direction, which for now, just going to set as zero. And we can put that out, call this Gaussian Blur and pass in gaps, save that and run. If you can bear with this, you see that both of them use the same code size, but this is less blurred
as compared to the average method. And the reason for this is because a certain weight value was added when computing the blur. Okay, so let's move on to the next method. And that is median blur. So let's go back to our image. And medium blurring is basically the same thing as averaging, except that instead of finding the average of the surrounding pixels, it finds the median of the surrounding pixels. Generally, medium blurring tends to be more effective in reducing noise in an image as compared to averaging and even Gaussian Blur. And it's pretty good at
removing some salt and pepper noise that may exist in the image. In general, people tend to use this image in advanced computer vision projects that tend to depend on the reduction of substantial amount of noise. So let's go back here. And the way we apply the blur is by saying, let's call this median and set the Z and set this equal to CV dot median, blue, we pass in the source image, and this kernel size will not be a tuple of three by three, but instead, just an integer to three. And the reason for this
is because open CV automatically assumes that this kernel size will be a three by three, just based off this integer. And we can print this out. Let's call this median, blue, and pass in median. And let's compare it with that. So I set that to seven. And comparing it with Gaussian Blur, and averaging blur, you tend to look at this. And you can make up some differences between the two images. So it's like as if this was your painting, and it was still drawing. And you take something and smudge over the image and you get
something like this. Now generally, medium blurring is not meant for high Colonel sizes like seven or even five in some cases, and it's more effective in reducing some of the noise in the image. So let's, let's change this all to three by three. Let's copy that, change that to three by three. And we can change that to three. And now let's have a comparison between the three. This is your Gaussian below. This is your average in blue, this is your median love. So compared with these two, you can see that there is kind of less
blurring when Gaussian when you can sort of make out the differences between the two Very subtle, but there are a couple of differences between the two. Finally, the last method we're going to discuss is bilateral blurring caused by natural lateral. Now bilateral bearing is the most effective, and sometimes used in a lot of advanced computer vision projects, essentially because of how it blurs. Now traditional blurring methods basically blur the image without looking at whether you're, whether you're reducing edges in the image or not. bilateral blurring applies blurring but retains the edges in the image. So
you have a blurred image, but you get to retain the edges as well. So let's call this bilateral and multilateral and set this equal to CV dot bilateral filter. And we pass in the image, we give it a diameter of the pixel neighborhood. Now notice this isn't a kernel size, but in fact, a diameter. So let's set this to five for now, give it a sigma color, which is basically the color sigma sigma color, a larger value for this color sigma means that there are more colors in the neighborhood, that will be considered when the
blue is computed. So let's set this to 15. For now. And sigma space is basically your space sigma. larger values of this space, sigma means that pixels further out from the central pixel will influence the blurring calculation. So let's set this to 50. So let's take a look at that sigma spacing. So for example, in bilateral filtering, if this is the value for this central pixel, or the true center is being computed, by giving a larger values for the Sigma space, you essentially are indicating that whether you want pixels from this far away, or maybe
this far away, or even this far away from influencing this particular calculation. So if you give like a really huge numbers, then probably a pixel in this region might influence the computation of this pixel value. So let's set this to 15. For now, and let's display this image. So call the cv.on show is called as bilateral and pass on bilateral. Let's save that and run. And this is your bilateral image. So let's compare with all the previous ones that we had. Compared with this. Much better compared with averaging way much better. Let's compare with median.
The edges are slightly, it's slightly blurred. If you compare with the original image, they kind of look the same thing. Okay, it kind of looks like there's no blur applied. So maybe let's increase this diameter to I know 10. And not much was done, the edges are still there, it kind of looks like the original image itself. So let's try into one of the other parameters. Let's add this to 3435. Let's set this dude 25. We're only playing around with these with these values. And now you can basically make our generic that this is starting
to look a lot like median blow. We need even larger values. It's starting to show you that this is more looking like a smudged painting version of this image, right, there's a lot of blur applied here, but the council looking smudged. So definitely keep that in mind when you are trying to apply blurring the image, especially with the bilateral and median lowering, because higher values of this basic mouth or bilateral or the kernel size for medium glowing, and you tend to end up with a washed up smudged version of this image. So definitely keep that
in mind. But that kind of summarizes whatever we we've done in this video, we discussed averaging, Gaussian, median and bilateral blurring. So in the next video, we'll be talking about bitwise operators in open CV. So again, like always, if you have any questions, leave them in the comments below. Otherwise, I'll see you guys in the next video. Hey everyone, and welcome back to another video. In this video we're gonna be talking about bitwise operators in urban CV. Now, there are four basic bitwise operators and or XOR and not. If you've ever taken an introductory CS
course, you will probably find these terms familiar bitwise operators, and they are in fact used a lot in image processing, especially when we're working with masks like we'll do in the next video. So at a very high level bitwise operators operate in a binary manner. So a pixel is turned off if it has a value of zero, and is turned on if it has a value of one. So let's actually go ahead and try to import NumPy as NP. And what I'm going to do is I'm going to create a blank variable and set this
equal to NP dot zeros of size 400 by 400. And we can give it a datatype of you I empty it is what I'm going to do is I'm going to use this blank variable as a basis to draw a rectangle and draw a circle. So I'm going to say return angle is equal to CV dot rectangle, we can say blink dot copy. And we can pass in the starting point. So let's give it a margin of around 30 pixels on either side. So we're going to start from 30, comma 30. And we can go
all the way across to 370370. And we can give it a color. Since this is not a color image, but rather binary image, we can just give it one parameter, so 255. White, and give it a thickness of negative one, because we want to fill this image. And then I'm going to create another circle variable and set this equal to CV dot circle, we're going to say blank, don't copy, we are going to give it a center. So the center will be the absolute center, so 200 by 200. And let's give it a radius of
give a radius of 200. And give it a color up to five, five, and let's fill in the circle. So negative one. So let's display this image and see what we've seen or we're working with. So we'll call this rectangle and passing the rectangle. And we're going to do the same thing with the circle, it's called a circle. And pass in the circle, save that and run Python bitwise r p y. So we have two images that we're going to work with this image of rectangle, and this image of a circle. So let's start off
with the first basic bitwise operator, and that is bitwise. And so before we actually discuss what bitwise ad really is, let me show you what it does. So essentially, what I'm going to do is I want to say bitwise is go and is equal to CV dot bitwise. And, and basically what I have to do is pass in two source images that are these two images, rectangle, and circle. Now we can display this image, let's call this beautiful lines, and let's pass in bitcoins and save, run. And essentially, you get back this image. So essentially,
what bitwise AND did was it took these two images, placed them on top of each other, and basically returned the intersection. Right, and you can make out when you take this image, put it over this image, you have some triangles that are common to both of these images. And so those are set to black zero, while the common regions are returned. So the next one is basically bitwise. Or now bitwise, or real simply returns both the intersecting as well as the non intersecting regions. So let's try this bitwise OR is equal to CB dot bitwise
AND scope or you pass in rectangle, we pass in circle. Now we can print that, let's call this bitwise OR pass in bitwise. Oops, that was or save that and run and bitwise OR, okay, there's a bitwise OR, by mistake. It's a bitwise OR basically return this funky looking this funky looking shape. Essentially what it did is it took these two images, put them over each other from the common regions and also found regions that are not common to both of these images and basically superimpose them. So, basically, you can just put them together and
find the resulting shape and this is what you get, but this image over this and you get this moving on. The next one is bitwise XOR, which basically is good for returning the non intersecting regions. So this found the the intersecting oops, the inter setting regions this found the sky Brought back, the no one intersecting in interest selecting regions, and xR only finds the non intersecting regions. So let's do that I say bitwise call this XOR is equal to CV dot bitwise underscore xR, we pass in the rectangle, passing the rectangle when we pass in
the circle, we can display this CV and I'm sure close bitwise XOR. And we can pass in bitwise XOR. Save that and run. And here we have the non intersecting regions of these two images when you put them over each other. Pretty cool. And just to recap, this bitwise AND AGAIN, returns the intersection regions bitwise, or returns the knowledge second regions as well as the intersecting regions bitwise XOR, returns the knowledge second regions. So essentially, if you take this bitwise XOR, and subtract it from bitwise, or you get bitwise end. And conversely, if you subtract
bitwise, and from the device, or you get bitwise XOR. Just so essentially, that's a good way of visualizing what exactly happens with these bitcoins operators. And finally, the last method we can discuss is bitwise. Not essentially, it doesn't return anything. What it does is it inverts the binary color. So let's do that. So let's call this bitwise. Not is equal to CV dot bitwise. underscore not. And this only takes in one source image. So let's set this to the rectangle put out. And we can display this. Let's call this rec tangle not, we can pass
in bitwise not see that. And basically what it did is if you look at this image, it found all the white regions, all the white pixels in the image and inverted them to black and all the black images it inverted to white, essentially, it converted the white to black and from the ads from black to white. So we can try that with the circle. Let's call this circle, we can pass in the circle here. Save and Run and the resultant the resulting circle, not that you get is this. This is white hole. This is a
black hole for physicists out there. Okay, so that's pretty much it. For this video, I just wanted to introduce you all to the idea of bitwise operations and how it works. In the next video, we'll be actually talking about how to use these bitwise operations in a concept called masking. So if you have any questions, leave them in the comments below. Otherwise, I'll see you guys in the next video. Hey, everyone, and welcome back. In this video, we're going to be talking about masking in open CV. Now in the previous video, we discussed bitwise operations.
And using those bitwise operations, we can essentially perform masking in open CV masking essentially allows us to focus on certain parts of an image that we'd like to focus on. So for example, if you have an image of people in it, and if you're interested in focusing on the faces of those people, you could essentially apply masking and essentially mask over the people's faces and remove all the unwanted parts of the image. So that's basically our high level intuition behind this. So let's actually see how this works in open CV. So I basically read in
a file and display that image. The other thing I'm going to do is I'm going to import NumPy NumPy as NP, what I'm going to do is I'm going to say blank is equal to NP dot zeros of size of size image dot shape with the first two values. Now this is extremely important, the dimensions of the mask have to be the same size as that of the image. If it isn't, it's on good work. And we can give it a data type of UI eight, you can see it if you want to display this,
we can display this. It's just going to be a black image, schools blank image and pawson blank. Essentially, what I'm going to do is I'm going to draw a circle over this blank image and call that my mask. So I'm going to say mask is equal to CV dot circle. We're going to draw the blank image on the blank image, we can give it a center of this image so let's say image dot shape of Have one divided by two divided by two, and image down shape of two image a shape of zero divided by
divided by two. And we can give it a radius of, I don't know, I'd say 100 pixels, give it a color of 255, give it a thickness of negative one. And we can visualize a mask as mask and passing mask. So let's run that. Python masking dot p y. And this is essentially our mask. There's the blank image we're working with. And this is the image that we want to mask over. So let's actually create a masked image, we're going to say masked image is equal to CV dot bitwise. underscore and this source image. So
IMG, IMG, and we specify the parameter mask is equal to mask, which is this circle image over here. And we can display this image, call this masked image. And we can pass in masked, save that and run. And this is essentially your masked image, you took this image, you took this image, you put this image over and found the intersecting region. Okay, by optionally passing the mask is equal to mask. That's exactly what we're doing. Cool. That's right. And, you know, play around with this, let's maybe move this by a couple of pixels around, let's
say 45. Save and Run moves down to zero, okay, this has to be 45 plus 45, save up and running. And we get the image of the cat, we can draw, we can draw a circle, or we can draw a rectangle instead. What's bottom blank, skip that. Let's give you that in draw, give it a static endpoint of let's copy this and add a couple of pixels or maybe 100 pixels this way, in 100 pixels. This way, we can get rid of this, we don't need that and say that, right? This is this, this is
the square. And this is essentially the masked image. So let's actually try this with. So let's actually try this with a different image. So we have got an image. Let's try it with maybe these cats too. Let's go back to cats to save that run. And this is the mask that we get by putting these two on each other. And essentially, you can play around with these as you feel fit. You can maybe try different shapes, weird shapes. And the way you can do get these weird shapes, essentially creating a circle or rectangle and applying
bid wise and you get this weird shape. And then you can use that weird shape as your mask. So let's just try that. Let's let's try that. Oh, we're going to say let's, let's call this circle and blanked out copy copy and create a rectangle. Let's just grab it from this re read Where are we from bitvise Let's grab this rectangle copy that piece over time the copy 3030 Okay, blank, same shape. So let's create this weird weird shape is equal to CV dot bitcoins on the scope end of this circle this rectangle and we
don't need to specify anything else. um what's one of visualizes let's close this out try to see see it on on show call this the weird shape passing the weird shape and wrong. masking undefined was mask westmar Mosque Okay. Good. This is the weird shape that we get. We're not really going for a half moon But hey, whatever. Let's close this out. Use this weird shape is mask. So use weird shape as a mask and let's see the final mask image and this is essentially your weird weird shape, masked image. Let's call this a weird
shape mask image, weird shaped mask damage. This little halfmoon here. And essentially you can, you can do pretty much anything you want with this, you can experiment with various shapes and sizes and stuff like that. But just know that the size of your mask has to be at the same dimensions as that of your image. If you want to see why not maybe subtract 100 pixels possible, but let's support it, though. So that's maybe like subtract tubal on it. I don't know whether that'll work. But guess what? Okay, so let's just say, image on shape
of while I'm okay, let's just give it a different size. What are we? Why are we even using image, let's go this size of 300 by 300. Definitely not the size of this. And we get this assertion failed m time, blah, blah, blah, maskhadov, same size, in function, whatever. So essentially, these need to be at the same size, otherwise, it's going to fail and throw you an error. So that's it for this video, we talked about masking, again, nothing to do different. We've essentially used the concept of bitcoins and from the previous video, and you
will see that when we move on to computing histograms in the next video, where masking really comes into play, and how masking really affects your histograms. So if you have any questions again, leave them in the comments below. Otherwise, I'll see you in the next video. Hey, everyone, and welcome back to another video. In this video, we're going to be talking about computing histograms in open CV. Now histograms basically allow you to visualize the distribution of pixel intensities in an image. So whether it's a color image, or whether it's a grayscale image, you can visualize
these pixel intensity distributions with the help of a histogram, which is kind of like a graph or a plot that will give you a high level intuition of the pixel distribution in the image. So we can compute a histogram for grayscale images and compute a histogram for RGB images. So we're gonna start off with computing histograms for grayscale images. And so let's just convert this image to grayscale is activity don't CVD color, pass the image and give it a color code of of color underscore BGR. To gray, it means read this image with gray and
passing Great. Now to actually compute the grayscale histogram. What we need to do is essentially call this gray underscore hist and set this equal to CV dot calc hist. This method will essentially compute the histogram for the the image that we pass into. Now this images is a list, so we need to pass in a list of images. Now since we're only interested in computing a histogram for one image, let's just pass in the the grayscale image, there thing we have to pass in is the number of channels which basically specify the index of the
channel we want to compute a histogram for that since we are computing the histogram for a grayscale image, let's wrap this as a list and pass in zero. The next thing we have to do is provide a mask do we want to compute a histogram for a specific portion of an image, we will get to this later. But for now just have this to num. His size is basically the number of bins that we want to use for computing the histogram. Essentially, when we plot a histogram, I'll talk about this concept of bins. But essentially,
for now, just set this to 256 wrapped as a list. And that's wrapped out as list. And the next thing I want to do is specify the range of the range of all possible pixel values. Now for our case, this will be 02256. And that's it. So to prop this image, let's actually use matplotlib. So import map plot matplotlib.pi plot as PLT, and then we can instantiate of PLT dot figure, a PLC figure. Let's give it a tidy, let's call this gray kale histogram. We can essentially give it a label across the x axis and
we're going to call this bins. Let's give this a y label and set this equal to the number of pixels. The number Have pixels. And that's why label. And finally, we can plot by saying PLT dot plot the, the grayscale histogram. And Valley, we can essentially give it a limit across the x axis. So PLT dot x Lim have a list of 02256. And finally, we can display this image. So PLT dot show, save that and run Python histogram, dot p y. And this is the distribution of pixels in this image. As you can see,
the number of bins across the x axis basically represent the the intervals of pixel intensities. So as you can see that there is a peak at this region, this means that this is close to 5060 ish. So this means that in this image, there are close to 4000 pixels that have an intensity of 60. And as you can see that there's a lot of, there's a lot of peeking in this region, so between probably 40 to 70, there is a peak of pixel intensities of close to 3000 pixel intensities in this image. So let's try
this with a different image. Let's try this with a cants. I'm just going to save that and run. And there is a peaking of pixel values in between 202 25. And this makes sense because most of the image is white. So given that reason, you can probably deduce that there will be a peak into words white or 255. Five. So this is essentially computing the grayscale histogram for the entire image, what we can do is we can essentially create a mask, and then compute the histogram only on that particular mask. So let's do that. Let's
go back to masking. Let's grab this, grab this. Let's go right up there. I set this to image dot shape of the first two values the sizes of the same. Let's essentially draw a mask, which will be CV dot circle of all blank. And we can get the center of image into a shape of one by by divided by two, image doing shape of zero divided by two over two, give it a radius of 100 pixels, give it a color of 245 give it a thickness of negative one, we can display a mask let's call
this let's call as mask policy mask. And here's where things get interesting. We can get the grayscale histogram for this mask. And the way we do that is by setting this mask parameter to mask two instead of none. We set this to mask and let's see what that does to our histogram MPs and undefined great. And I couldn't make this kind of made a mistake here. Oh, that's right. This is the Masters not exactly the Masters is circle. This is a this will be a circle circle. And essentially we need to mask out the image
so we so the way we do that is by creating a mask and setting this equal to CV dot Bitcoins. bitwise unscored, and we can pass in the grayscale image the grayscale image, and we can pass in the mask which is equal to circle. Now we can use that as the mask. So let's display that x Sorry, I made a mistake, but hopefully things should be fine right now. So this is the mask and this is the histogram computed for this particular mask. As you can see that there is a peaking of pixel intensity values
in this region. And there are smaller pickings in in these regions down below. Let's try this with another image. Let's pass in the cats cats to the cats though jpg. This is our mask and this is the there is a peaking in this image towards 50. Okay, so that was it for computing grayscale histograms. Let's move on to true To compute a color histogram, that is to compute a histogram for a color image to an RGB image. So let's call this color histogram. And the way we do that is, instead of converting this image to
grayscale, let's comment all of this out. We will use a mask later. That's come in all of this out. There is mask will be for IMG, IMG. And yeah, that's pretty much it. So let's start with the color histogram. The way we do that is let's define a tuple of colors, and set this equal to b, then tuple of G, a tuple element of R. And what I'm going to do next is I'm going to say for our common call in enumerate of colors. What I'm going to do is I'm going to say hist. So
I'm going to plot the histogram by saying CV dot calc hist, we're going to compute it over the image itself, the channels will be I mean, this eye over here, we're going to provide a mask of none for now. Give it a his size of 256 and give it a ranges of 02256. And then let's do a PLT dot plot hist and give it a color equal to call. And only we can do a PLT dot x Lim of 02202256. And for this purpose, we can essentially grab this, copy that uncomment this out. And we
can do a PLT dot show. So this should work. We're missing something Oh, no, don't think of him. We're not, we're not computing this histogram for a mask, or we live there next. But let's save that run. Oh, cool. And let's close enough, I made a mistake, this is a color histogram shouldn't make much of a difference. So this is the color histogram that we get for the original image not for a mask. But in fact, this image. So as you can see that this color image basically computed the plot for blue channel, the red
channel and the green channel as well. So using this, you can basically make out that there is a peaking of blue pixels that have a pixel intensities of 30. There's a peaking of red, probably around 50, peaking of green, probably around 8075 to 80. Cool and using this, you can basically make up the distribution of pixel intensities of all three color channels. So let's try and apply a mask by setting this equal to mask. Let's see whether we have everything in order. It's a bit more than mass mass, mass mass mass masks. Masks are not
the same size, okay, I finally got the error. So basically, the mass needs to be a binary format. So instead of passing in this mask, this will actually be the masked marks image, Regan passes me fat mask, and we can change the circle to mask. Now this should work without any arrows. And we can change that to masked. Yeah, that's around that. And now we get the color histogram for this particular mask, I made a mistake because I use this as my mask to compute the histogram for one channel. The problem was this masked image
was actually a three channels and I attempted to use this s3 channeled mask to calculate the histogram per channel, which isn't allowed in open CV. So that was my mistake. What kind of use the wrong variable names so confused, but essentially, this is it, you're computing the histogram for a particular section of this image. And this is what you get there is a high peaking of red in this area, high peaking of blue in this era, and high peaking of greens I'm over here. So essentially, that's it for this video. histograms actually allow you to
analyze the distribution of pixel intensities, whether for a grayscale image or for a colored image. Now these are really helpful in a lot of advanced computer vision projects. When you actually trying to analyze the image that you get, and maybe try to equalize the image so that there's no peeking of pixel values here and there. In the next video, we'll be talking about how to thresh hold an image and the different types of thresholding. As always, if you have any questions, leave them in the comments below. Otherwise, I'll see you guys in the next video.
Hey, everyone, and welcome back to another video. In this video, we're going to be talking about thresholding in open CV. Now, thresholding is a binary realisation of an image. In general, we want to take an image and convert it to a binary image that is an image where pixels are either zero or black, or 255, or white. Now, a very simple example of thresholding would be to take an image and take some particular value that we're going to call the thresholding value. And compare each pixel of the image to this threshold of value. If that
pixel intensity is less than the threshold value, we set that pixel intensity to zero. And, and if it is above this threshold value, we set it to 255, or white. So in this sense, we can essentially create a binary image just from a regular standalone image. So in this video, we're actually going to talk about two different types of thresholding, simple thresholding and adaptive thresholding. So let's start off with simple thresholding. So in essence, what I want to do is, before I talk about simple thresholding, is I want to convert this BGR image to grayscale.
So I'm going to say gray is equal to CV dot CVT color, we pass in the image, we pass in the color code, which is vgr. To correct, we can display this image called this gray, we can pass in great. Cool. So let's start off with the simple thresholding. So essentially to to apply this this idea of simple thresholding, we essentially use the CV dot threshold function. Now this function returns a threshold, and Thresh, which is equal to CV dot threshold. And this in essence takes in the grayscale image, the grayscale image has to be
passed in to this thresholding function, then what we do is we pass in a threshold value. So let's set this to 150 for now, and we have to specify something called a maximum value. So if that pixel value is greater than is greater than 150, what do you want to set it to, in this case, we want to binarize the image. So we set it to 245. And finally, we can specify a thresholding type. Now this thresholding type is essentially CV dot thrush underscore binary. And what this does is basically it looks at the image
compares each pixel value to this threshold value. And if it is above this value, it sets it to 255. Otherwise, it infers that if it falls below, it sets it to zero. So essentially returns two things trash, which is the thresholded image or the binarized image and threshold, which is essentially the same value that you passed 150, the same threshold value you pass in, will be returned to this threshold value. So let's actually display this image. So let's say cv.rm show, we'll call this threshold. We'll call this simple thresh hold dead, and we can pass
in thrash. So let's save that and run Python thrash. Da p y in this is a thresholded image that you get. Again, this is nothing too different from when we discussed thresholding in the in one of the previous videos, but this is essentially what you get. So let's play around with these threshold values. Let's set this to 100. And let's see what that does. And as a result, both parts of the image have become white. So and of course, if you give it a higher value, less parts of the image will be white. So let's
set this to 225. And very few pixels in this thresholded image actually have a pixel intensity of greater than 225. So what we can do after this is essentially create an inverse thresholded image. So what we could do is we could essentially copy this and instead of saying Thresh, I'm going to say thrush underscore inverse, and I'm going to leave everything else the same. Let's set this to 150. And the same thing here, and instead of passing in the type of thresholding, I'm going to say CV dot Thresh underscore binary under scope inverse. And let's
call this thresholded inverse. And we can pass in inverse. So let's save that and run. And this is essentially the inverse of this image, instead of setting pixel intensities that are greater than 150 to 255, it sets whatever values that are less than 150, to 255. So that's essentially what you get. Right, all the black parts of this image will change to white, and all the white parts of the image will change to black. Cool. So that's a simple threshold. Let's move on now to adaptive threshold data thresholds. Now, as you can imagine, we got
different images, when we provided different threshold values. Now, kind of one of the downsides to this is that we have to manually specify a specific threshold value. Now, some cases this might work, in more advanced cases, this will not work. So one of the things we could do is we could essentially let the computer find the optimal threshold value by itself. And using that value that refines it binary rises over the image. So that's an essence the entire crux of adaptive thresholding. So let's set up a variable called adaptive on its growth Thresh. And set
this equal to CV dot adaptive threshold. And inside I want to pass in a source image. So let's set this to gray, I'm going to pass in a maximum value, which is 255. Now notice there is no threshold value. adaption method basically tells machine which method to use when computing the optimal threshold value. So for now, we're just going to set this to the mean of some neighborhood of pixels. So let's set this to CV dot adaptive on the scope Thresh. And score mean underscore C. Next, we'll set up a threshold type. This is CV
dot Thresh. underscore binary, which again, I think do different from this from the first example. And two other parameters that I want to specify is the block size, which is essentially the neighborhood size of the kernel size, which open CV needs to use to essentially compute mean to find the optimal threshold value. So for now, let's set this to 11. And finally, the last method we have to specify is the c value. Now this c value is essentially an integer that is subtracted from the mean, allowing us to essentially fine tune our threshold. So again,
don't worry too much about this, you can set this to zero. But for now, let's set this to three. And finally, once that's done, we can go ahead and try to display this image. So let's call this adaptive thresholding. And we can pass in adaptive cash. So let's save that and run. And this is essentially your adaptive thresholding method. So essentially, what we've done is we've defined a kernel size or window that is drawn of this image. In our case, this is 11 by 11. And so what open CV does is it essentially computes a
mean over those neighborhood pixels, and finds the optimal threshold value for that specific part. And then it slides over to the right, and it slides, it does the same thing. And it's lines down and does the same thing so that it essentially slides over every part of the image. So that's how adaptive thresholding works. If you wanted to fine tune this, we could change this to a threshold, just go binary and scope inverse, you're just to see what's really going on under the hood. Cool. So all the white parts of the image will change the
black and all black parts of the image have changed white. So let's play around with these values. Let's set this to probably 13 and see what that does. Okay, definitely some difference from the previous hyper parameter. So let's try it. Let's go with let's set this to 11. And let's set this to maybe one. Okay, definitely more white. Let's set this to maybe five in a row that you can play around with these values, right, the more you subtract from the mean, the more accurate it is, right, you can basically make out the edges now
in this basket. So let's maybe increase that to nine. And you get less white spots in the image. But essentially, now you can make the features better. Cool. So that was essentially adaptive thresholding, adaptive thresholding that essentially can Did the optimal threshold value on the basis of the mean? Now we don't have to stick with the mean, we can go with something else. So instead of mean, let's set this to Gaussian. So let's save that and see what that does. And this is the thresholded image using the Gaussian method. So the only difference that Gaussian
applied was essentially add a weight to each pixel value, and computed the mean across those pixels. So that's why we were able to get a better image than when we use the mean. But essentially, the adaptive thresholding mean works. In some cases, the Gaussian works in other cases, there's no real one size fits all. So really play around with these values, see what you get. But that's essentially all we have to discuss. For this video, we talked about two different types of thresholding, simple thresholding and adaptive thresholding. In simple thresholding, we have to manually specify
a threshold value. And in adaptive thresholding, open CV does that for us using a specific block size, or current size and other computing the threshold of value on the basis of the mean, or on the basis of the Gaussian distribution. So in the next video, the last video in the advanced section of this goes, we're going to be discussing how to compute gradients and edges in an image. So if you have any questions, leave them in the comments below. I'll be sure to check them out. Otherwise, I'll see you guys in the next video. Thanks
for watching, everyone, and welcome back to another video. In this video, we're going to be talking about gradients and edge detection in urban CV. Now, you could think of gradients as these edge like regions that are present in an image. Now, they're not the same thing gradients and edges are completely different things from a mathematical point of view. But you can pretty much get away with thinking of gradients as edges from a programming perspective only. So essentially, in the previous videos, we've discussed the canny edge detector, which is essentially kind of an advanced edge detection
algorithm. That is essentially a multi step process. But in this video, we're going to be talking about two other ways to compute edges in an image. And that is the lat placing and the Sobel method. So let's start off with the left place here. So the first thing I want to do is I want to convert this image to grayscale, recalling the CVT. DVD to color color method, we pass in the image, and we say CV color on describe BGR to grip, we can display this image is called as gray. And we can pass in
every pass. Great. So let's start with the Laplacian. So we're going to define a variable called lap and set this equal to CV dot lap lesion. And what this essentially will do is it will take in a source image, which is great now, and it will take in something called a D depth or data depth. Now for now when we set this to CV dot 64, F is for long with whatever I do next, I'm going to say lap is equal to NP dot u 98. And instead I'm going to pass an NP dot absolute.
And we can pass in lap. And since I'm using NumPy, I can actually go ahead and import NumPy as NP and when I go to display this image coil CV dot I'm sure method is called this lamp lesion. And we can pass on lap lap Save and run a call this Python good radians dot p y invalid syntax CV dot Okay, it's cv.cv on score 64 F. Say that. And this is essentially the law placing edges in the image kind of looks like an image that is drawn over a chalkboard and then smudge just a
bit. But anyway, this is the lab laser method. Let's try this with another image. Let's try this with this park called Boston. Let's call this the park. Save that in right. And this essentially looks like a pencil shading off this image. It's all the edges that exists in the image, or at least most of the edges in the image are essentially drawn over with the pencil and then lightly submerged. So that's essentially the left lacing edges you could say. So again, don't worry too much about why we converted this to in the UI and then
we computed the absolute value. But essentially the Laplacian method computes the gradients of this image the grayscale image. Generally this involves a lot of mathematics but Essentially, when you transition from black to white and white to black, that's considered a positive and a negative slope. Now, images itself cannot have negative pixel values. So what we do is we essentially compute the absolute value of that image. So all the pixel values of the image are converted to the absolute values. And then we convert that to a UI 28 to an image specific datatype. So that's basically
the crux of what's going on right over here. So let's move on to the next one. And that is the subtle gradient magnitude representation. So essentially, the way this does is that Sobel computes the gradients in two directions, the x and y. So we're gonna say sobble x, which is the gradients that are computed along the x axis, and Seth is equal to CV dot Sobel. And we can pass in the image, let's add this to the grayscale image, we pass in a data depth, which is cv.cv on school 64 F. And we can give
it an x direction. So let's set this to one and the y direction, we can set that to zero. And let's copy this and call it soble. Why, and instead of one, zero, we can save zero comma one. And we can visualize this let's print. Let's call this symbol x, and we can pass in sub x. And we can say it's either long show Sabo y and set this to Sabo y. Call that and these are essentially the gradients that are computed, this is over the y axis. So you can see a lot of y
horizontal specific gradients and the sub x was computed across the y axis. So you can see y axis specific gradients. Now we can essentially get the combined Sobel image by essentially combining these two Sobel x and Sobel why, and the way we do that is we're gonna say combined on combined underscore sobald and set this equal to CV dot bitwise. on school or, and we can pass in Sabo x and symbol y. And we can display this image, so let's call CV dot I'm show we get to combined Sobel and we can pass in the
combined symbol. Let's run that. And this is essentially the combined sobble that you get. It isn't, let's go back here. So it essentially took these two apply and CV dot bitwise OR, and essentially got this image. So if you want to compare this with lat race in two completely different algorithms, so the results you get will be completely different. Okay, so let's compare both of these left patient and the Sobel with the canny edge detector. So let's go down here. Let's say Kenny is equal to CV, don't, Kenny. And we can pass in the image.
So let's possible a grayscale image. Let's give it to threshold values of 150 and 175. And we're done. Let's display this image. Let's call this Kenny, we can pass in Kenny. So let's save that. And let's see what that gives us. So let's compare that with you. So that's essentially it. This is the last place in gradient representation, which essentially returns kind of this pencil shading version of the image of the edges in the image, combined several computes the gradients in the X in the y direction. And we can combine these two with bitwise OR,
and Kenny is basically a more advanced algorithm that actually uses Sobel in one of its stages. Like I mentioned, Kenny is a multi stage process, and one of its stages is using the symbol method to compute the gradients of the image. So essentially, you see that the canny edge detector is a more cleaner version of the edges that can be found in the image. So that's why in most cases, you're going to see the Kenny used. But in more advanced cases, you're probably going to see a Sobel use a lot. Not necessarily lap racing. But
so definitely. So that's pretty much it for this video. And in fact, this video concludes the advanced section of this course. Moving on to the next section, we will be discussing face detection and face recognition in urban see, we're actually going to touch on using hard cascades To perform some face detection, and face recognition, we actually have two parts. Face Recognition with open CV is built in face recognizer. And the second part will be actually building our own deep learning model to essentially recognize some faces in an image. Again, like always, if you have any
questions, leave them in the comments below. Otherwise, I'll see you guys in the next section. Hey, everyone, and welcome back to another video. We are now with the last part of this Python and open CV coasts, where we are going to talk about face detection and face recognition in open CV. So what we're going to be doing in this video is actually discussing how to detect faces in urban CV using something called a har cascade. In the next video, we will talk about how to recognize faces using open CV is built in face recognizer. And
after that, we will be implementing our own deep learning model to recognize during the simpson counters, we're going to create that from scratch and use open CV for all the pre processing and displaying of images and stuff like that. So let's get into this video. Now, face detection is different from face recognition. Face Detection merely detects the presence of a face in an image, while face recognition involves identifying whose face it is. Now, we'll talk more about this later on in this course. But essentially, face detection is performed using classifiers. A classifier is essentially an
algorithm that decides whether a given image is positive or negative, whether a face is present or not. Now classify needs to be trained on 1000s and 10s, of 1000s of images with and without faces. But fortunately for us, open CV already comes with a lot of pre trained classifiers that we can use in any program. So essentially, the two main classifiers that exist today are har cascades, and mo advanced classifiers core local binary patterns, we're not going to talk about local binary patterns at all in this course. But essentially the most advanced how cascade classifiers,
they're not as prone to noise in an image as compared to the hard cascades. So I'm currently at the open CVS GitHub page where they store their whole cascade, there are cascade classifiers. And as you can see, there are plenty of hard cascades that open CV makes available to the general public. You have a hard cascade for an eye, fragile cat face, from face default, full body, your left eye, a Russian license plate, Russian plate number, I think that's the same thing. How cascade to detect smile, Hawk cascade for detection of the upper body, and
things like that. So feel free to use whatever you want. But in this video, we're going to be performing face detection. And for this, we're going to use the har cascade underscore frontal face underscore default dot XML. So when you go ahead and open that, you're going to get about 33,000 lines of XML code. So all of this. So what do you have to do is essentially, go to this role button, and you'll get all this raw XML code, all you have to do is click Ctrl A, or Command D if you're on a Mac,
and click Ctrl C, or Command C, and then go to your VS code or your editor and create a new file. And we're going to call this har unscrew face dot XML. And inside this, I want to paste in those 33,000 lines of XML code. Go ahead and save that and our classifier is ready. So we can go ahead and close this out. So we're going to be using this Hawk cascade classifier to essentially detect faces that are present in an image. So in this file called face detect, face underscore detected py, I inputted open
CV, I basically read in an image of Lady a person, that is this image over here. And we can go real quick and display this. So let's run Python face to face on disco with detect dot p y, and we get an image in a new window. Cool. So let's actually implement our code. The first thing I want to do is convert this image to grayscale. Now face detection does not involve skin tone or the colors that are present in the image. These hard cascades essentially look at an object in an image and using the
edges tries to determine whether it's a face or not. So We really don't need color in our image. And we can go ahead and convert that to grayscale, TV dot CVT color, passing the image in CV dot color on BGR. To gray. And we can display this call this gray of color is gray person, we can pass in our name. Let's save them and run. And we have to pass in the gray. Okay, we have a blu ray person over here. So let's move on to essentially reading in this har underscore face dot XML file.
So the way we do that is by essentially create a har cascade variable. So let's set this to her underscore cascade. And we're going to set this equal to CV dot cascade classifier, in inside, what I essentially want to do is, is parsing the path to this har to this XML file. That is as simple as saying har en disco face dot XML. So this cascade classifier class will essentially read in those 33,000 lines of XML code and store that in a variable called har underscore cascade. So now that we've read in all har cascade
file, let's actually try to detect the face in this image over here. So what I'm going to do is essentially, say faces on school rect is equal to har underscore cascade dot detect multi scale, and instead, we're going to pass in the image that we want to detect based on. So this is great, we're going to pass in a scale factor. Now let's set this to 1.1. Give it a variable called minimum neighbors, which essentially is a parameter that specifies the number of neighbors rectangle should have to be called a face. So let's set this
to three for nap. So that's it. That's all we have to do. And essentially, what this does is this detect multiscale, an instance of the cascade classifier class will essentially take this image, use these variables called scale factor and minimum labels to essentially detect a face and return essentially the rectangular coordinates of that face as a list to faces on the score rec. That's exactly why we are giving it faces on scope rect rect, to rectangle. So you can essentially print the number of faces that were found in this image by essentially printing the length
of these faces on the score rect variable. So let's do that. Let's print the number, number of faces found is equal to, we can pass in the length of faces on school rect. So let's save that and run. And as you can see that the number of faces that were found one, and that's true, because there's only one person in this image code. Now utilizing the fact that this faces on school rec is essentially the rectangular coordinates for the faces that are present in the image, what we can do is we can essentially looping over
this list and essentially grab the coordinates of those images and draw a rectangle over the detected faces. So let's do that. So the way we do that is by saying for x comma y comma w comma H, H in faces underscore rect, what we're going to do is we're going to draw a rectangle CV to a rectangle over the original image. So IMG give the point one, this point one is essentially x comma y. And point two is essentially x plus w comma y plus H. Let's give it a color. Let's set this to green.
So zero comma 255 comma zero, give it a thickness of two. And that's it. And we can print this or we can display this image. So let's set this to detected basis. And we can pass in OMG. And if you look at this image, you can essentially see the rectangle that was drawn over this image. So this in essence, is the face that open CV is hard cascades found in this image. So let's try this with another image. So what I have here are a couple of people, a couple of other people then image of
five people, so we're going to use that image and try to see how many faces OBG these hard cascades could detect in this image. So let's set this to group two. We can change that to a group of five people. Save that close, right people save and run. And I want to point real quick that the number of faces that we found, were actually seven. Now we know that there are five people in this image. So let's actually see what open CV thought was face. So we can go real quick. So actually detected all the
faces in this image, all the five people, but it also detected two other guests a stomach and part of a neck. Now this is to be expected because her cascades are really sensitive to noise in an image. So if you have something that pretty much looks like a face, like a neck looks like a face, it has the same structure as the typical face would have. I don't know why her stomach was recognized as face. But again, this is to be expected. So one way we can try to minimize the sensitivity to noise is essentially
modifying the scale factor in minimum neighbors. So let's increase the minimum neighbors to maybe six or seven. Save that in run. In as you can see, now six faces were found. So I guess by increasing the minimum neighbors parameter, we essentially stopped open open CV from detecting her stomach as face. So let's try this with another more complex image, a couple of people in group one. So if I change that to group one, save rock. Now, as you can see that the number of faces we've never found was six. And we know that this is
not six. So let's actually change this minimum minimum neighbors just a bit. Let's change this first to three and see how many faces we'll found. Now we got 14. Okay, some people at the back want chosen because either the faces are not perfectly perpendicular to the camera, or they're wearing some accessories on the face, for example, eyeglasses. This dude's wearing a hat, this dude ran on cap, and stuff like that. So let's actually change this to one. And let's see what that gets us to one. So, Ron, and now we got 19 faces that were
found in this image. So it's about looping through these values by changing these values. by tweaking these values, you can essentially get a more robust result. But of course by by minimizing these values, you're essentially making open CV small cascades more prone to noise. That's the trade off you need to consider. Now, again, hard cascades are not the most effective in detecting faces, they're popular, but they're not the most advanced, they are probably not what you would use if you were to build more advanced computer vision projects. I think for that, dealings face recognizer is
more effective and less sensitive to noise than open CV is our cascades. It stands for your use case hard cascades are most more popular. They're easy to use, and they require minimal setup. And if you wanted to extend this to videos, you could all you have to do is essentially detect hot cascades on each individual frame of a video. Now I'm skipping that because it's pretty self explanatory. So that's pretty much it. For this video, we discussed how to detect faces in open CV using open CV as har cascades. In the next video, we will
actually talk about how to recognize faces in open CV using open CV is built in face recognizer. So like always, if you have any questions, comments, concerns, whatever, leave them in the comments below. Otherwise, I'll see you in the next video. Hey everyone, and welcome back to another video. In this video, we will learn how to build a face recognition model in open CV using open CV is built in face recognizer. Now, on the previous video, we dealt with detecting faces in open CV using hard cascades. This video will actually cover how to recognize faces
in an image. So what I have you have five folders or five different people. Inside each folder, I have about 20 images of that particular person. So Jerry has 21 images. Anson has 17 Mindy kailyn has 22 Ben Affleck has 14 and so. So what I'm essentially going to do is we're going to use open CV is built in face recognizer. And we're going to train that right now. So on all of these images in these five folders, now this is sort of like building a mini sized, deep learning model, except that we're not going
to build any model from scratch, we're going to use open TVs built in face recognizer, or we're going to do is we're actually going to pass in these close 90 images. And we're going to train that recognizer on these 90 images. So let's create a new file. And we're going to call this faces ns, go train dog p y, we're going to input always, we're going to input CV to our CV, and we're going to import NumPy as NP. So the first thing I want to do is essentially create a list of all the people
in the image. So this is essentially the names of the folders of these particular people, what you could do is you can manually type those in, or you could essentially create an empty list. Let's call this P. and we can loop over every folder in this folder, and let's set this to an Austrian. And we can say P dot append, I, or we can print P. Let's save that and run Python beaters on skirt on skirt trained on p y. And we get the same list that we got over here. So that's one way of
doing it. And what I'm going to do next is I'm essentially going to create a variable called dir, and set this equal to this base folder, that is this folder which has, which contains these five folders of these people. Cool. So with that done, what we can do is we can essentially create a function called def create unscrewed train, that will essentially loop over every folder in this base folder. And inside that folder, it's going to loop over every image and essentially grab the face in that image and essentially add that to our training set.
So our training set will consist of two lives. The first one is called features, which are essentially the image arrays of faces. So let's set this to an empty list. And the second list will be our corresponding labels. So for every face in this features list, what is its corresponding label, whose face does it belong to, like one image could belong to Ben Affleck, the second image could belong to elton john, and so on. So let's create a function. So we're going to say we're going to loop over every person in this people list, we're
going to grab the path for this person, so for every folder in this base folder, going through each folder and grabbing the path to that folder. So that's essentially as simple as saying, Oh s dot path dot join the join. And we can, we can join the der with person. And what I'm going to do is I'm gonna create a labels label variable, and set this equal to people don't index of person. And now the way inside each folder, we're going to loop over every image in that folder. So we're going to say for image
to image in our stock list there. In path, we are going to grab the image Park. So we're going to say image, underscore path is equal to OS dot path dot join. We're going to say join, we're going to join the PATH variable to the image. Now that we have the path to an image, we're going to read in that image from this path. So we're going to create a variable called IMG underscore rain is equal to CV dot m read image on the scope path. We're going to convert this image to grayscale I think
CVT color, pause and IMG. On scope right here we can pass in t v dot c, CV dot color on the screw BGR to grip. Cool and now now with that done we can essentially trying to detect the faces in this image. So let's go back to face underscore detect and grab the whole cascade classifier variable here. Let's paste that there. And we can create a set of faces on school rect and set this equal to har underscore cascade dot detect multi scale this will take in the gray image scale factor of 1.1 and add
a minimum neighbors of four. And we can loop over every every face in this face rect. So for for x comma y comma w comma each in faces rect, we are going to grab the bases region of interest, and set this equal to and basically crop out the face in the image. So we're going to say gray, y two y plus h, and x 2x plus W. And now that we have a faces a face region of interest, we can append that to the features list. And we can append the corresponding label to the labels
list. So we're going to do features dot append, we're going to pass in faces on scope, or y. And we can do a labels dot append label. This label variable is essentially the index of this list. Now the idea behind converting a label to numerical values is essentially reducing the strain that your computer will have, by creating some sort of mapping between a string and the numerical label. Now the mapping of we are going to do is essentially the index of that particular list. So let's say that I grab the first image, which is an
image of Ben Affleck, the label for that would be zero, because Ben Affleck is at the zeroeth index of this people list. Similarly, elton john, an image of elton john would have a label of one because it is at the second position or the first index in this people's lists. So that's essentially the idea behind this. Now, with that done, we can essentially trying to run this and see whether we got any errors or not. And we can bring the length of the features. So let's say length, length of the features list, is equal to
the length of features. And we can do the same thing. This was copy this length of the labels list, set this to length of labels. So that shouldn't give us any error. So let's run that. And we get the length of the features 100 and length of labels 100. So essentially, what we have 100 faces, and 100 corresponding labels to this faces. So we don't need this anymore. What we can do is we can essentially use this features and labels list now that it's appended to train our recognizer on it. So the way we do
that is we instantiate our face recognizer, call this, as the instance of the cv.face.lb p h face recognizer underscore create class. And this will essentially instantiate the face right now. Now we can actually train the recognizer on, on the features list, and the labels and the labels list. So the way we do that is by saying face underscore recognizer dot train, and we can pass in the features list, and we can pass in the labels list. And before we actually do that, I do want to convert this features and labels list to NumPy arrays. So
we're going to do so we're going to say, features is equal to NP dot array of features. And we can say of labels is equal to NP dot array of labels, and save that and run. OK, data object. So let's add this to the typed object. Horse in detail type is equal to object. And we can actually print when this is done, so let's say craning don't. And we can actually go ahead and save this features and labels list. And we're going to say NP dot save, we're going to call this features.np y, and we
can pass in features. And we can do NP dots, MP dot save labels dot nPy. And we can pass in the labels. So let's save that and run cool. So essentially, now the face recognizer is trained and we You can now use this. But the problem here is that if we plan to use this face recognizer in another file, we'll have to separately and manually repeat this process, this whole process of adding those images to a list and getting the corresponding labels, and then converting that to NumPy rays, and then training all over again, what
we can do and what open CV allows us to do is essentially save this trained model so that we can use it in another file in another directory in another part of the world just by using that particular YAML source file. So we're going to repeat this process again. But the only change that I'm going to do is I'm going to say face recognizer dot save. And we're going to give the path to a YAML source file. So we're going to save face unscrewed trend, dot yamo. So let's repeat this process, again, trainings down. And
now you'll notice that you have face on scope trained a yamo file in this directory, as well as faces, as well as features that nPy annal labels dot nPy. So let's actually use this train model to recognize faces in an image. So let's close this out. And create a new file. And we're going to call this face on school rec hig nition dot p y. Very simply, we're going to import NumPy as MP and CV to sc V, we don't need us anymore, because we're not looping over directories, we can essentially create our har underscore
cascade file. So let's do that. Let's go up here, grab this, we can load our features and label rate using by saying features is equal to NP note load features.np y. And we can say labels is equal to NP dot load with called as labels.np y. And we can essentially now read in this face on the scope train that yamo file. So let's go over here. Let's grab this line. And let's say face recognizer dot read. And we're going to give it the path to this YAML source file. So face unscrewed face on screw trained
dot yamo. So that's pretty much all we need. Now we need to get the mapping. So let's grab this list as well. And so that's pretty much all we have to do. So let's create a variable image that set this to save it out in read, give it a path. Let's create eight. Let's grab one from this validation and one from this validation. I have one of Ben Affleck. So let's try this with grab that piece on. And Graham, maybe this image from but I have a piece out there. And that's a JPG file. And
we can convert that image to grayscale tv.tv t color positive image CV no color, I'm just going to BG BGR to Great. So let's just play this image. See, Cole is the person on identified person that's patient on the board. So what we're going to do is we're going to first detect the face in the image. So the way we do that is by saying faces on underscore rect is equal to r on this go cascade dot detect multiscale we pass in the gray image, we pass in the scale factor, which is 1.1 give it
a minimum neighbors of foe and we can loop over every face in his faces on score rect Sue, Sue for x comma y comma w comma H in faces basis on score rect. We can grab the region of interest to what we're interested in finding your Why two one plus H and x 2x plus H. And now we can predict using this face recognizer to we get the label and a confidence value. And we say face recognizer dot predict And we predict on this faces on scope ROI, let's print. Let's call this label is equal
to label with confidence, off color confidence. And since we're using numerical values, we can probably we can probably say people off label. Okay. And we can essentially what we can do is we can put some text on this image just to show us what's really going on, we can put this on the image, we can create a string, variable of people of label. So the person involved in that image, given an origin, let's say 10, let's say 20 by 20. Give it a font face of CV dot font, unscrew Hershey on school complex. Give it
a font scale of one point of 1.0 here to color of zero, comma 255 comma zero and give it a thickness of two. And we can draw a rectangle over the image over the face. This is we draw this over the image, we give it x to y and x plus delta t comma y plus H. We give it a color of zero comma two five comma zero, and we can give it a thickness of two. So with that done, we can find this display this image called as the detected bass. And we can pass
the image. And finally we can do a CV Delta Wait, key zero. So let's save and see what we get Python. Python face on school record, Nish. nation dot p y. Cannot be alone in love pickles equals false. Gosh, where's that? We probably don't need this anymore. So let's come up with that out. there if you wanted to use these again, you could essentially use MP dot load. Since the data types are objects, you can basically say allow pickle is equal to true. That's essentially but we're not going to use it. So let's comment that
out. Save. And okay, we get Ben Affleck with a confidence of 60%. So that's pretty good. 60% is good, given the fact that we only train this recognizer on 100 images. So let's try this with another image of Ben Affleck, maybe this image, copy that go right across here. And this again, is Ben Affleck with the confidence of 94% pretty good. Let's go back. Let's go maybe to an odd person. Let's go to Madonna. For to grab this. It's a pain rally. But let's change this to Madonna. And let's grab this person, I'm not sure
whether it will detect a face because of the head. But let's face that anyway. Now this is where you'll find that obon TV's face recognizer built on face recognizer is not the best. It currently detects it currently detects that this person in the image is actually Jerry seinfield. And that will be the confidence of 110%. Maybe there's an error somewhere. I'm not sure why that went to 111. But pretty sure there's an error somewhere. But essentially, this is where the discrepancies lie. It's not the best so it's not going to give you accurate results. So
let's try this with another image. Let's go about to maybe share image, copy that piece of paper. Okay, this is Madonna with the confidence of 96.8% Okay, let's move on to elton john Watson had problems with elton john. Given the fact that he looked pretty similar to Ben Affleck for some reason. Copy that chain got to elton john just called john and print that. Okay, elton john with the confidence of 67% pretty good. Okay, so not bad. This is more accurate than what I predicted. before filming this video, I did a couple of trial runs,
and I got very good results. For example, elton john was continually detected as Jerry seinfield or Ben Affleck. Madonna was detected as Ben Affleck, Ben Affleck was detected as Mindy kaylin. Minnie kailyn was detected as elton john, and a whole bunch of weird results. So I guess that we did something right. I must have done something wrong in the trial runs. But hey, we get good results. And that's pretty good. Now, I'm not sure why that gave a confidence of 111%. Maybe there's an error somewhere with the training sent. But I guess for the most
part, you can ignore that. Given the fact that we get pretty good results. So that's pretty much it. For this video, we discussed face recognition. In open CV, we essentially build a features list and a labels list and we train a recognizer on those two lists. And we saved a model as a YAML source file. In another file, we essentially read in that saved model saved YAML source file. And we essentially make predictions on an image. And so in the next video, which will actually be the last video In this course, we will discuss how
to build a deep learning model to detect and classify between 10 Simson characters. So if you have any questions, comments, concerns, whatever, leave them in the comments below. Otherwise, I'll see you in the next video. Hey, everyone, and welcome to the last video in this Python and urban TV cuts. Previously, we've seen how to detect and recognize faces Pioli in open CV, and the results we got were varied. Now, there are a couple of reasons for that. One is the fact that we only had 100 images to train the recognizer on. Now, this is a
significantly small number, especially when you're training recognizes and building models. Ideally, you'd want to have at least a couple of 1000 images per class. The second reason lies in the fact that we want using a deep learning model. Now as you go deeper into especially computer vision, you will see that there are very few things that can actually beat a deep learning model. So that's what we're going to be doing in this video. Building a deep computer vision model to classify between the sensing characters now generate open CV GS for pre processing the data that
is performing some sort of image normalization, mean subtraction, and things like that. But in this video, we're going to be building a very simple model. So we're not going to be using any of those techniques. In fact, we'll only be using the open CV library to read an image and resize them to a particular size before feeding it into the network. Now, don't worry if you've never used a built a deep learning model before. But this video will be using tensor flows implementation of Kara's now I want to keep this video real simple, just so
you have an idea of what really goes on in more advanced computer vision projects. And carers actually comes with a lot of boilerplate code. So if you've never built a deep learning model before, don't worry, Cara's will handle that for you. So kind of one of the prerequisites to building a deep learning model is actually having a GPU. Now GPU is basically a graphical processing unit that will help speed up the training process of a network. But if you don't have one, again, don't worry, because we'll be using candle, a platform, which actually offers free
GPUs for us to use. So real simple, before we get started, we need a couple of packages installed. So if you haven't already installed Sierra at the beginning of this course, go ahead and do a pip install Sierra. The next package you require is conero. And this is a package that I built specifically for deep learning models built with Kerris. And this will actually appear surprisingly useful to you, if you're planning to go deeper into building deep computer vision models. Now, installing this package on your system will only make sense if you already have a
GPU on your machine. If you don't, then you can basically skip this part. So we can do a pip install conero. And can our actually installs TensorFlow by default, so just keep that in mind. So with all the installations out of the way, let's actually move on to the data that we're going to be using. So the data set that we're going to be using is the Simpsons character data set that's available on kaggle. So the So the actual data that we're interested in lies in this instance on score data set folder. This basically consists
of a number of folders with several images inside each subfolder. So Maggie Simpson has about 12 128 images. Homer Simpson has about 2200 images. Abraham has about 913 images. So essentially, what we're going to do is we are going to use these images and feed them into our model to essentially classify between these characters. So first thing we want to do is go to kaggle.com slash notebooks, go ahead and create a new notebook. And under Advanced Settings, make sure that the GPU is selected, since we're going to be using the GPU off of that click
Create. And we should get a notebook. So we're going to rename this to Simpsons. And one thing I want to do is enable the internet since we're going to be installing a couple of packages over the internet. So do use the Simpsons character data set in our notebook, you need to go head to add data search for Simpsons. And the first one by Alec city, I should pop up, go ahead and click Add. And we can now use this data set inside a notebook. So the first thing I want to do is we're going to
pip install, seer. And now, now the reason why I'm doing this yet again. Now the reason why I'm doing this, again, is because candle does not come pre installed with Sierra and conero. Now I did tell her to install it on your machine. And the reason for that is because y'all can work with it and experiment with. So once that's done, go ahead to a new cell. And let's import all the packages that we're going to need. So we're going to input o s, we're going to input seer, we're going to input conero. We're going
to import NumPy. As NP we're going to input CV to add CV, and we're going to input GC for garbage collection. Then next what we want to do is in basically when building deep computer vision models, your model expects all your data or your image data to be of the same size. So since we're working with image data, this size is the image size. So all the data or the images in our data set will actually have to be resized to a particular science before we can actually feed that into the network. Now with a
lot of experiments, I found that an image size of 80 by 80 works well, especially for this Simpsons data set. Okay, the next variable we need is the channels. So how many channels do we want in our image. And since we do not require color in our image, we're going to set this to one basically grayscale. To run back. What we need next is we're gonna say car on the scope path is equal to the base path where all the data where all the actual data lines, and that is in this Simpsons on a school
dataset, this is the base folder for where all our images are stored in. So we're going to copy this file path. And we're going to paste that in that. Cool. So essentially, what we're going to be doing now is, we're essentially going to grab the top 10 characters, which have the most number of images for that class. And the way we're going to do that is we are going to go through every folder inside the Simpsons underscore data set, get the number of images that are stored in that data set, store all of that information
inside a dictionary, so that dictionary in descending order, and then grab the first 10 elements, first n elements in the dictionary, hope that made sense. So what we're going to do is we're going to say create an empty dictionary. We're going to say for character in our stop list, der called car path, we are going to say car underscore dict of car is equal to length of s dot list dir of Oh s dot path dot join. We're going to join the car on a scope pump with car. So essentially, all that we're doing is
we're going through every folder or grabbing the name of the folder, and we're getting the number of images in that folder. And we're storing all that information inside the dictionary called car underscore dict. Once that's done, we can actually sort this dictionary in descending order. Sending order and the way we do that is with a car unscored dict is equal to car dot SOT unscored dict of car underscore dict. And we said descending equals to true. And finally, we can print the dictionary that we get. So this is the dictionary that we have. As you
can see, Homer Simpson has the most number of images at close to 2300. And we go all the way down to Lionel, who has only three images in the data. So what we're going to do is now that we have this dictionary, what we're going to do is we are going to grab the names of the first 10 elements in this dictionary, and store that in a list of characters list. So we're gonna say characters. So we're gonna say characters is equal to is equal to an empty list. And we're going to say, for i
in car underscore dict. We're going to say characters, dot append, and we're going to append the name. So we say I have zero. And we say, if count is greater than or equal to 10, we can break it, we need to specify a count of zero, and increment that counts. Okay, once that's done, let's print what our characters looks like. So we've essentially just grabbed the names of the characters. So with that done, we can actually go ahead and create the training data. And to create a training data is as simple as saying train is
equal to seer dot pre process. From there, we pass in the car on scope, puff, the characters, the number of channels, the image size, image size, as we say, is shuffle equals true. So essentially, what this will do is it will go through every folder inside car on the scope path, which is Simpsons underscore data set. And we'll look at every element inside characters. So essentially, it is going to look for Homer Simpson, inside the Simpsons underscore data set, it will find Homer Simpson, whereas Homer Simpson, it even finds Homer Simpson is going to go
through inside that folder, and grab all the images inside that folder, and essentially add them to our training set. Now, as you may recall, in the previous video, a training set was essentially a list. Each element in that list was another list of the imagery and the corresponding label. Now the label that we had was basically the index of that particular string in the characters list. So that's essentially the same type of mapping that we're going to use. So Homer Simpson is going to have a label of zero, Ned will have label of one, Liza
will have label of three, and so on. So once that's done, go ahead and run this. Now, basically, to basically the progress is displayed at the terminal. If you don't want anything outputted to the terminal, you can basically just set set the verbosity to zero. But I'm going to leave things just as it is, since there are a lot of images inside this data set. This may take a while depending on how powerful your machine is. So that's only took about a minute or so to pre process our data. So essentially, let's try to so
let's essentially try to see how many images there are in this training set. We do that by saying the length of trip. And we have 13,811 images inside this training set. So let's actually try to visualize the images that are present in this dataset. So we're going to import matplotlib.pi plot as PLT, we're going to do a PLT dot bigger. And we're going to give it and we're going to give it a big size of 30 by 30. Let's do a plt.im show, we can pass in first. The first element in this training sets are
zero and then zero. And we can give it a color map off gray. And we can display this image. Now the reason why I'm not using open CV to display this image is because for some reason, open CV does not display properly in Jupyter Notebook. So that's why we're using matplotlib. So this is basically the image that we get somebody legible, but to a machine. This is a valid image. Okay, the next thing we want to do is we want to separate the training set into the features and labels. Right now. The train That basically
is a list of 13,811 lists inside it. Inside each of that sub lists are two elements, the actual array and the labels itself. So we're going to separate the feature set, or the arrays and the labels into separate lists. And the way we do that is by saying feature set and labels is equal to car dot zip on school train, we are going to separate the training set and give it an image size of image size. And that's n equals two. So basically, what this is going to do is going to separate the training set
into the feature set and labels and also reshape this feature set into a four dimensional tensor, so that it can be fed into the model with no restrictions whatsoever. So go ahead and run that. And once that's done, let's actually try to normalize the feature sets. So essentially, we are going to normalize the data to be in the range of to be the range of zero comma one. And the reason for this is because if you normalize the data, the network will be able to learn the features much faster than, you know, not normalizing the
data. So we're gonna say feature set is equal to square dot normalize, and when to pass in, peaches set. Now we don't have to normalize the labels. But we do need to one hot encode them that is convert them from numerical integers to binary class vectors. And the way we do that is by saying from TensorFlow, del Kara's dot EDU tools input to underscore categorical. And we can say labels is equal to two categorical, and we get possible labels, and the number of categories, which is basically the length of this characters list. Cool. So once
that's done, so once that's done, we can actually move ahead and try to create our training and validation data. Now, don't worry too much if you don't know what these are. But basically, the model is going to train on the training data and test itself on the validation data. And we're going to say x underscore train x underscore Val and y underscore train and y underscore Val is equal to sere dog train, Val split. And we're going to split the feature set and the labels using a particular validation ratio, which we're going to set as
point two. So that's basically what we're doing, we're splitting the feature set and labels into into training sets and validation sets with using a particular validation ratio to 20% of this data will go to the validation set, and 80% will go to the training set. Okay. Now, just to save on some memory, we can actually remove and delete some of the variables and we're not going to be using. So we do that by saying Dell crane, Dale feature sets, do labels, and we can collect this by saying GC dot collect. Cool. Now moving on, we
need to create an image data generator. Now this is basically an image generator that will essentially synthesize new images from already existing images to help introduce some randomness to our network and make it perform better. So we're gonna say data, Gen is equal to can narrow down generators, dot image, data generator. And this basically instantiates, a very simple image generator from the caros using the Kara's library. And once it's done, let's create a training generator. By setting this equal to data Jim, don't float. And we can pass in extra rain and wind rain and give
it a batch size equal to batch size. So let's actually create some variables here. That's set my batch size to 32. And maybe let's train the network for 20 bucks. So once that's done, that's wrong bet. So with that done, we can actually proceed to building our model. So let's call this creating the model. And before making this video, I actually tried and tested out a couple of models found that one actually provided me with highest level of accuracy. So that's the same model, the same model architecture that we're going to be using. So we're
gonna say model is equal to conero dot models dot create Simpsons model, we're going to pass in an image size, which is equal to the image size, we're going to say set the number of channels equal to the number of channels, we're going to say, we're going to set the output dimensions to the to 10, which is basically the length of our characters, then we can, then we can specify a loss, which is equal to binary binary cross entropy. There we get set a decay of E of e to the negative sixth power, we can
set a learning rate equal to point 001. We can set Oh momentum of point nine, and we can set Nesterov to true. So this will essentially create the model using the same architecture I built and will actually compile the model so that we can use it. So go ahead and run this. And we can go ahead and try to print the summary of this model. And so essentially, what we have is a functional model, since we're using Kerris as functional API. And this essentially has a bunch of layers, and about 17 million parameters to drain
out. So another thing that I want to do is create something called a callbacks list. Now this callbacks list will contain something called a learning rate shedule that will essentially sheduled the learning rate at specific intervals so that our network can essentially train better. So we're going to say call callbacks list is equal to learning rate shedule. And we're going to pass in conero.lr on SCO LR underscore schedule. And since we're using learning where shedule Let's go and input. So from TensorFlow, Delve, Cara's no callbacks input learning rate schedule. And that should about do it.
So let's actually go ahead and train the model. So we're gonna say training is equal to model dot fit, we're gonna pass in the train gin, we're going to say, steps per epoch is equal to the length of X on school train divided by divided by the batch size. We're going to say epochs, is equal to epochs. We're going to give the validation data validation data equal to a tuple of x underscore Val, and y underscore Val. And we're going to say validation steps. Easy Steps is equal to the length of y on school Val,
divided by divided by the batch. batch size. And finally, we can say callbacks, is equal to callbacks, callbacks on school missed steps per epoch, that steps for epoch. And that should begin training. And once that is done, we end up with a baseline accuracy of close to 70%. So here comes the exciting part, we're now going to use open CV to test how good our model is. So what we're going to do is we're going to use open CV to read in an image at a particular file path. And we're going to pass that to
our network and see what the model spits out. So let's go ahead and go to this Simpson test set. So let's go ahead and try to search for all the way down here. Let's look at our characters. Let's just print that out just to see what characters we trained on. Okay, let's look for Bart Simpson. Probably bit irritating, but since data sent Okay, we got an image of Bart Simpson. So click this and random path, got a test path, set this equal to our string. And what we're gonna do is we're gonna say mg is
equal to CV dot m read test on secure path. And, and just to display this image, we can use PLT dot m show, we can pass the image, pass in the image and give it a column map of gray. And we can do a PLT dot show. Okay, PLT show. And okay, so this is an image of Bart Simpson. So what we're going to do is we are going to create a function called prepare, which will basically prepare our image to be of the same size and shapes and dimensions as the images we use to
prepare the model in. So this will take in a new image. And what this will do is we'll, we'll convert this image to grayscale so we're gonna say injury is equal to CV dot, CVT color, and we're gonna pass in the injury and we're gonna say CV dot color on scrub BGR. To gray, we can resize it to our image size. So we're going to say mg is equal to CV dot resize, we're going to resize the image to be image underscore with size, I'm going to reshape this image. So injury is equal to stare
dot reshape, reshape of image. We want to reshape the image to be of image size with channels equal to one. And we can return image. So let's run that. And let's go down here. And let's say predictions is equal to model dot predict and prepare image. And we can visualize this predictions. So let's print predictions. And essentially, this is what we get. So to print the actual class, what we can do is we can print their characters, BB NP dot arg Max, and we can say predictions of zero. You're not trying to visualize this image
so we can do a PLT dot m show. Let's pass in the image. And PLT dot show. Let's grab this and move this down. ver. That's right. Yeah. Okay, so this is our image. And right now our model thinks that buttons in is in fact, Lisa Simpson. Let's go. Lisa Simpson. Okay. Let's try another image. Let's try probably this image is Bart Simpson 28. Let's go up they and maybe change that to two, eight. run that. This is Bart Simpson. Let's run this. And let's and again, we got Lisa Simpson. So let's try with a
different image. Yeah, we do. We did Charles Montgomery to copy this. All the way down there. We got Charles predict, and we get van Hughton. Okay, definitely not the best model that we could have asked for. But hey, this is a model. Right now this base discounting has a baseline accuracy of 70%. Although I would have liked it to go to at least 85%. In my test, it had gone close to 90 92%. I'm not sure exactly why this went to 70%. But again, this is to be expected into building deep computer vision models is
a bit of an art. And it takes time to figure out what's the best model for your project. So that's it for this Python and open c because this goes to is basically kind of a general introduction to open CV and what it can do. And of course, we've only just scraped the surface and really this A whole new world of computer vision now fair. Now, while we obviously can't cover every single thing that open CV can do, I've tried my best to teach you what's relevant today in computer vision. And really one of its
most interesting parts, building deep learning models, which is in fact, where the future is self driving vehicles, medical diagnosis, and tons of other things that computer vision is changing the world. And so all the code and material that was discussed throughout this course is available on my GitHub page. And the link to this page will be in the description below. And just before we close, I do want to mention that although I did recommend you installed Sierra in the beginning, we barely use it throughout the coasts. Now, it's probably not going to make sense to
you right now. But if you plan to go deeper into computer vision into building computer vision models, Sierra lasher proved to be a powerful package for you. It has a lot of helper functions to do just about anything. Now I'm constantly updating this repository. And if you want to contribute to these efforts, definitely do that you can set a pull request with your changes. And if it's helpful, it will be merged into the official code base, and you'll be added as a contributor. If you want to building deep learning models with Kara's then conero will
be useful to you. But again, for the most part, it's usually software that you'll be using. So anyway, with that said, I think I'll close up this course, if this goes helped you in any way and God you're more interested in computer vision, then definitely like this video, subscribe to my channel, as I'll be putting up useful videos on Python computer vision and deep learning. So I guess that's it. I hope you enjoyed this post and I'll see you in another video.