MIT 6.S191: Deep Generative Modeling

45.49k views6729 WordsCopy TextShare

Alexander Amini

MIT Introduction to Deep Learning 6.S191: Lecture 4 Deep Generative Modeling Lecturer: Ava Amini ** ...

Video Transcript:

This next lecture is one of my favorite lectures in terms of the topic and also the relevancy of of generative modeling. And I think you may be here because you recognize that we're in this tremendous age of generative AI. And today we're going to really learn the foundations of generative modeling where we'll talk about building systems that can not only look for patterns in data and use that to make a decision but also go a step past this to actually generate new instances of data based on those learned patterns.

This is a really really complex and powerful idea and it's exploded in a couple year in the past couple years across a range of applications and we've already actually got a sense of how powerful and how tremendous these types of generative models are. And so we started off the lecture yesterday with advances in image generation and audio generation. And here we're going to continue a little bit on that same theme.

So, here are three examples of faces. And I want you all to take a moment and think about which one of these three faces is real. Raise your hand if you think it's face A, face B, and face C.

All right, I'd say A was the majority. But in fact, all of these faces are not real. They're all fake.

This So, so this really highlights kind of the realism of these methods and the power of generative modeling. And indeed, with generative modeling, we're able to go into a new set of types of learning problems that we're able to handle. So far in the course, we've been looking at problems of what we call supervised learning, which means that we are given a set of data.

And that data comes with labels. And our goal is to learn a mapping from the input data to a label like a class label or a numerical value. And in this class, because it's a class on deep learning, we've been concerned about learning these mappings that are defined by deep neural networks.

But fundamentally that function that maps from data to label could really be anything. Now we're going to turn our attention to a brand new set of problems of what we call unsupervised learning. And in this case, we have just data.

We don't necessarily have labels. And our goal is to build up a deep learning model that can learn the underlying structure patterns to the data. And this can give us some insights in sort of the foundational level of the distribution of this data and be used to actually generate brand new data instances.

And really the fundamental principle of generative modeling which is a instance and type of unsupervised learning is that given samples from some training set we want to learn a model that represents the distribution of the data that we have sampled from. And we can use such a generative model that models this probability distribution in a couple of ways. The first is by doing what we call density estimation where the principle is given some set of samples right along some probability distribution.

We want to train and build up a model that can learn the underlying dist the underlying probability distribution that describes and captures the space from which this data came. Now when we learn this distribution, what that allows us is to sample from this probability distribution to now actually generate brand new data instances because we've built up a model that captures that underlying probability distribution of the data. What this means is that generated samples reflect that distribution.

And in both cases, our underlying question here is how can we effectively learn this estimate of the probability distribution P of X that's going to be as close and as similar to the true data distribution as we possibly can get. Right? So this is the core principle of generative modeling trying to learn a good probability distribution that represents your data.

Why why do we care? Why can we uh care about generative modeling and how can we leverage it in the real world? Well, first of all, by doing this density estimation and probability distribution learning, generative models are capable to uncover the underlying features in a data set.

So for example if we've gi we're given a data set of faces human faces there could be many many different faces and a priori we may not know what is the underlying distribution of these faces in terms of features like hair color skin tone pose um clothing etc. And so our training data may be over represented with particular features or under represented with particular features and we may not know this. Using generative models as you'll see in today's lab, we can actually automatically uncover and identify these features and learn which repres uh features are more or less represented and use this to actually create fairer and more representative data sets.

Another great example and use case is an outlier detection. So in the example of self-driving cars in autonomy that Alexander introduced, it's really critical to be able to ensure that the car can handle all possible scenarios that it may encounter on the road, not just those that are most common. Using a generative model, we can identify those rare events, those outliers that are at the tails of our probability distribution and use this to detect those rare instances and adjust the control of the car accordingly.

Finally, perhaps the most the use case that many of us may be interested in or familiar with is actually using the generative model to create new data samples and new data instances. Because these generative models are modeling and learning probability distributions, that means that sampling from that distribution yields new data instances. And this is really the fundamental backbone of generative AI.

And in this course, you'll see how this concept is really, really powerful in domains from natural language to computer vision to even the sciences and healthcare. So hopefully this motivates and sets this stage for why we really care about generative models. And today we'll focus on two foundational classes of generative models.

latent variable models that are broken down to autoenccoders and a really foundational architecture that we call generative adversarial networks. Both of these foundational classes are what we call latent variable models. And to get into that, the very first thing we need to do is understand what exactly is a latent variable.

I think a really great example to demonstrate this is this story from a ancient work Plato's Republic. And this is known as the myth of the cave or the allegory of the cave. And in this story, there's a group of prisoners.

And as part of their punishment, they're constrained to face a wall. The only things that they are able to see are shadows of objects that pass in front of a fire that's actually behind them. They can only see the reflections or the shadows of those objects on the wall.

And so to these prisoners, those shadows, those observations are their reality. But they can't actually directly see or measure those true structures that are casting the shadows that they observe. Those objects behind the prisoner's heads are like the underlying features or the underlying latent variables that govern the distribution of a data set.

And all we can do is give some way to observe those data instances. But maybe what we ultimately care about is capturing those driving underlying features or latent variables that uh yield the observations that we can make. So really one way to think about our goal in generative modeling is to find an effective way to capture these underlying features and extract them these latent variables when we are only given uh some observed data.

So the very first way that we're going to approach this is using a simple and intuitive generative model that we call an autoenccoder. And what these models try to do is to self-encode the input data that they see. To take a look at how the autoenccoder works, let's consider this simple example of handwritten digits images.

Here we are taking this example where we feed the image as raw data and pass it through a a deep neural network. Let's say it's uh convolutional layers. And ultimately what an autoenccoder does is it takes these successive layers and tries to output a lowdimensional latent space a representation of that data which is our goal in what we're trying to model and try to predict.

We call this network an encoder because we're mapping the data X into this set of latent variables Z. Now let's ask ourselves, okay, why would we care about the dimensionality of this set of latent variables Z? Any ideas on on why we would care about how many latent variables we're trying to learn?

We want to abstract the most important features that might not be necessarily needed for other Exactly. We're trying to extract and distill down to those most important features. Keeping a lower dimensional latent space means that we're effectively compressing the data, encoding it into a small latent vector and learning this very compact very rich feature representation.

Now the question is okay if we are able to generate this lower dimensional latent space how do we actually train the model to learn this latent variable vector. Do we have any explicit examples or labels of this training data Z that we we try to model? Can we do normal supervised learning?

I think we have in principle we have we have the data but we don't have the data so we add some yeah so if we if we had some prior on features that we care about maybe yes we could try to supervise against this but remember our fundamental goal is to uncover these features automatically we want to learn that directly from data without any supervision so that we don't bias our perspective on what are the actual important features from a datadriven way. So the idea is instead of using labels that are prespecified by us as the humans, we can use the data itself as a signal to learn these features in an unsupervised way. And so the way we can do that is we by not assuming any labels for what these uh features are.

Instead what if we just try to reconstruct the input and train a neural network to go from that space of pixels of the image go down to this lower dimensional space and then build a decoding model that can reconstruct the original image from that latent space. Now here we can compare this this original input to the reconstructed output and train the network by simply minimizing the distance between the inputs and that reconstructed output. And a way we can do this easily in images is we can take something simple like the mean squared error between the input and the reconstructed output.

subtracting the two images from each other, squaring the difference, and that gives us a pixel to pixel metric of the difference between the input and the reconstruction. And so what you can probably appreciate is now if we look at this loss function that we're actually using to train the network, it doesn't require any specification of labels or what the identity of these important features are. we can just use the signal of reconstructing that input as a way to learn these features.

Now let's simplify this plot just a little bit. Abstract away the individual layers in these two components of our autoenccoder, the encoder and the decoder. And again, right, our loss function doesn't require any labels at all.

This is really really a powerful idea because now we're getting into the capability of unsupervised learning. We can learn a quantity like that those latent variables in that lane space Z that we're interested in without any prespecification and without the need to directly observe those. Back to the question about why the dimensionality of that latent space matters.

Well, it has a huge impact on the richness and of features you're able to extract out and the bottleneck you place. If we think about this notion of autoenccoding as a form of compression, what that means is that the lower the dimensionality of the latent space, the more we're going to compress our data, the worse we're able to do at the reconstruction. And so really now the key concept of this autoenccoding approach is that we can use this bottleneck hidden layer to extract these latent features Z without any presp specified or supervised labels using a loss called the reconstruction loss that forces the network to encode as much information as possible through this process of having to decode back and reconstruct the input.

In this way, we're automatically encoding information within the data into the latent space Z. This principle of an autoenccoder. In practice though, what is commonly used is a buildup and a variation on this idea that introduces a little bit more diversity to be able to actually generate new data samples.

And that's the concept of variational autoenccoders or VAEEs which are really really a workhorse generative model in the capability of latent variable modeling as we just saw right in a traditional autoenccoder. The operation is deterministic. We have in our encoder normal layers in a neural network just like any other layer that we've seen before.

And we have normal layers on our decoder that take the latent variables and decode back out to generate an output. What this means though is if we feed in an input, we're going to get the same output reconstructed out so long as the weights are the same. If we don't change the weights given an output, we'll always get a same reconstructed out uh output back out.

The way to think about this is as a deterministic encoding that allows us to reconstruct an input with VAEEs. In order to learn a good quality latent space, we actually need to introduce some randomness, some stochasticity into this autoenccoder. And that's the principle behind variational autoenccoders or VAEs.

What we do with VAES is now instead of having a purely deterministic layer Z, we're going to introduce some sampling, some notion of uh stochasticity. Instead of learning the latent variables directly, if we parameterize each of these latent variables as a probability distribution defined by a mean mu and a standard deviation sigma, we can now learn these vectors of means and standard deviations separately such that we get a distribution over each of the latent variables in our latent space. What this means is that it's simply a probabilistic twist on this concept of autoenccoding.

In effect, what this means is now we can sample from these means and standard deviations to now produce new data instances. And all you can tell here is that VAEs are the same concept of encoding going down to a latent space and decoding back out to the original data space with just this probabilistic twist to it where we can sample uh through these latent variable layers. If we now start to make this a little bit for more formal, right?

Effectively what the encoder and decoder are doing are trying to learn probability distributions. The encoder is computing a probability distribution of the latent space given the input data X and the decoder is doing the reverse trying to compute the probability distribution of the data given the latent variables Z. These two networks are defined by two separate sets of weights phi and theta.

But we can train this network completely uh end to end with one loss function that's going to be a function of the input data and these two sets of weights fi and theta. So if we now take a closer look at what this loss looks like, we have uh two terms. One is the reconstruction loss which is very similar to what we saw as in the standard autoenccoder and secondly a new term called the regularization term which we'll get uh into in closer detail in a little bit with the reconstruction loss.

It's the same as just before. You can think about it as the error or the comparison of your input to your reconstructed output. And you want to try to match that to be as low as close as possible, right?

Do a faithful reconstruction. Now with the regular regularization term, things get a little bit more interesting. Recall now that we're trying to learn a probability distribution over these latent variables given the data X and that's what's computed by the encoder.

As part of regularizing this probability distribution, what that means is we want to make some initial prior guess on what these distributions of the latent variables should look like. This helps us infer in infer and enforce latent variables that follow this prior. And so when we introduce this regularization term, we need a way to compute a uh sense of distance between the distribution of those latent variables and our prior about what that probability distribution should look like.

So to break this down a little further, D is effectively just a distance metric. And what we're comparing is how similar that probability distribution of an inferred latent variable is to some prior some initial guess we play place on what that distribution should look like. To understand this a little bit further, let's first think about how we even choose this prior and why it's important.

One very common way to choose a prior for a VAE is to say okay let's do something simple. Let's say that we're going to uh promote these latent variables to roughly follow a normal distribution centered around a mean of zero and having a standard deviation of one. This encourages our encoder in our VAE to now try to locate the latent variables evenly and smoothly around the center of the latent space and distribute and the encoding such that um we can cover this latent space smoothly with just like with any other loss function like the cross entropy loss or the mean squared error.

We need a metric that computes how close we are to that initial prior. And one very common way to do this is a distance metric that compares how close two probability distributions are to each other. And this metric is called the KL divergence.

The equation for uh KL divergence with a normal prior is shown here. But effectively what you can think about this is as a way to compute and quantify the distance between two probability distributions. So right now at this point you're probably thinking okay well I get the principle of we're trying to learn a probability distribution over these latent variables.

We want some prior on how they look. But why is this actually important? Why do we even need to place these priors?

And why do we need to regularize our latent space? To do that and to gain intuition, let's first consider what properties we want our latent space to actually achieve. The first thing we care about in learning these features is this notion of continuity.

Meaning that if we have two data instances and they end up close to each other in the lane space, there should be similar content in those two data instances. meaning that closeness in the latent space corresponds to closeness in the data space. The second criteria is this idea of completeness.

We don't want there to be gaps. Meaning that if we sample from somewhere in our latent space, we should be able to still reconstruct or de decode back out some meaningful data instance. It should not be completely random or completely completely garbage.

To intu to intuitit this a little bit further, let's now consider the consequences of if we were to place no priors on these latent variables and not regularize them. Without regularization, we c we could have instances where there were two points in the latent space that had completely different meanings in our original data space. So in this example, we're thinking about shapes, right?

Squares, triangles, and circles. Here, if a square and a triangle end up close in the latent space, but are not similarly in decoded or in the original data space, that's not desired. We want triangles to map closer to triangles than they do to squares.

Secondly, there if we don't have completeness, if we just draw some sample from the latent space, we could end up with nonsensical um squiggles back out, not meaningful shapes. In contrast, what regularization allows us to do is to encourage the closeness where points that end up close in the latent space are semantically related similar shapes and also completeness meaning that we can sample from anywhere in this grid of the latent space and still get a good shape out when we decode. So this is a really really important concept of using regularization of the latent space to ensure continuity and completeness.

What is important to keep in mind is that just choosing some probability distribution doesn't necessarily guarantee these properties. Without a good term and a good prior for regularization, we could still have issues with the quality of our latent space. It turns out that by using a normal prior, we're able to regularize the latent variables accordingly by centering the means and controlling the spread of the variances so that we encourage continuity and completeness in these latent spaces.

So hopefully that conveys some intuition about this concept not only of regularization but why a prior like a normal prior can often work very well to build up a latent space that is both continuous and complete. This at its core kind of gets at this concept of latent variable modeling and with VAEEs how we can go from encoding to the latent space and decoding back out. The last step is to define exactly with these components of the loss function the reconstruction term and the regularization term how we can actually train the network end to end.

The problem that we run into is because we have this probabilistic latent space, we can't pass gradients back through. We can't back propagate normally like we saw in lecture one. Back propagation requires deterministic neural network layers or neural network lo nodes.

The key idea that unlocked the ability to train VAEs is to actually do a clever modification of this sampling uh operation to allow for training using back propagation end to end. Instead of just sampling our latent variables directly from a normal distribution par parameterized by a mean and standard deviation, we can instead sample our latent variables as a sum of a fixed deterministic set of means mu, a fixed set of standard deviation, sigma, and scale those standard deviations by another term that controls all the randomness that we care about, all the stochasticity and we call this constant epsilon. In practice, what this means is now z is the sum of mu plus sigma scaled by this random constant that's drawn from that normal prior.

We can visualize this very clearly by now looking at a diagram like this and how that affects back propagation and actually training the network. Originally, if we're trying to go directly through this sampling node Z, we can't back prop and we can't pass gradients through. Instead, if we now break out Z to uh control and encode all the sampling randomness into this um stochastic constant epsilon, we can allow for back propagating and passing gradients directly through Z back all the way to the input X.

This is really really important now because now we can actually pass gradients through end to end and learn these latent variables based on the data in a datadriven way. Finally, right once we've trained our network, what is powerful about VAES is we can go in and inspect those latent variables to see actually what are the features that were learned by the model in this process of encoding and decoding. And what's really beautiful and intuitive is often times those features have interpretable um considerations back in the original data space.

So if we look at how we can do this in the context of images, let's say we take a trained VAE over images of faces and now we sample from just one uh from variables in the latent space to now generate new data instances by decoding. If we can keep all but one variable fixed and just slowly perturb the value of a single latent variable, you can see that there this actually corresponds to a very interpretable feature with respect to what in the image is actually being picked up by this latent variable. In this case, it's the pose, the shift of the head in terms of how it's looking at the camera.

And so this gets at the concept of with a method like a VAE, we're able to capture different latent variables that correspond to different interpretable features in the space of the data that we care about. As you'll see in the software lab, we can use these models that can uncover these underlying latent variables as a means to actually uncover what are features that are relatively well represented in a data set or what features are under represented. And this has really really important implications for when we think about what biases or under representations could exist in our data sets that we use to train uh these deep learning models.

And you'll get hands-on practice with this in the software lab where you'll actually build a VAE model to automatically uncover latent features in uh data sets of human faces. All right. So to summarize the key concepts of these autoenccoder approaches like VAEEs, they're able to compress data into a set of latent features that are of lower dimensionality.

We can do this completely unsupervised without any labels by reconstructing the original input back through a decoder. From this uh representation uh we can generate the reconstructions in an entirely unsupervised way. We can we saw how we can use reparameterization to learn these networks end to end.

And we also saw how these latent variables actually have key meaning corresponding to important and interpretable features in the data space we we care about. And finally, by sampling from those latent variables, you can use the decoder portion of the network to now generate brand new instances of your original data space. So really vaes give us a very powerful way to combine this concept of density estimation over a set of latent variables with sample generation through decoding.

But what if our primary goal is to generate new samples as the output and we don't care so much about having a set of latent variables that could potentially be interpretable. One very early and very uh foundational approach to do this and focusing more on sample generation is an approach called GANs or generative adversarial networks. And the principle of GANs is motivated by the fact that what if we care just about generating new data instances that are very similar to the data that we trained our model on.

What this means is that we're trying to sample from this really complex distribution and we want our model to learn a good way to approximate that distribution. It can be very very difficult to learn this distribution directly. So instead, what Gans propose is an approach to transform something really simple, even as simple as completely random noise into the actual data distribution that we care about.

And as we'll see, the GAN approach trains a generator model that can learn a mapping from a distribution of completely random noise to the distribution of our training samples. The approach that GANs take is actually kind of clever and fun. Our goal is to try to produce this distribution of generated data that's as close to the real deal as possible.

GANs proposed an approach to doing this by putting these two competing neural networks side by side with each other, a generator and a discriminator. And the concept is that these are in a sort of a battle with each other. The generator network is tasked with trying to go from this distribution of completely random input noise to produce an imitation of the data.

Then the discriminator looks at generated instances that are supposedly an imitation of the data and its goal is to try to discriminate what is real and what is fake and learn a classifier to distinguish real versus fake. And by having these two networks compete against each other, we can try to force the discriminator to learn really well how to delineate good uh real versus fake. And the better it becomes at doing that, it forces the generator to in turn be better and better at producing fake data that can try to fool the discriminator.

to get a sense of the intuition behind this and how this works. Let's start with this really um one of my favorite illustrations of this example that conveys this intuition. So let's say we have just data on a onedimensional line and there are some instances of fake data.

They're going to lie somewhere on this line. Generator remember is starting from completely random noise. So initially without any training it's not going to be very good at producing uh fake data.

It produces some instances and now the discriminator sees these points and it also sees some real data. Its task is to be trained to output a probability that any given data example it sees is real. So, we're going to train the discriminator.

And at the beginning, its predictions may not be that great, but we train it. And it starts to increase the probabilities of what's real, decrease the probabilities of what's fake until we get a perfect separation where the discriminator can perfectly distinguish what's real and what's fake. Now the generator comes back and it sees where the real data lie and it starts moving its generated fake data closer and closer and closer to those instances.

And now the discriminator receives these new points, this new round of generations. And again, its task is to train and learn how to delineate real versus fake. It's going to estimate the probability that any given point is real and learn to decrease the probability on the fake points and increase the probability on the real points.

We repeat again one last time. The generator starts moving its creations closer and closer and closer to the real data such that the fake data are almost exactly overlapping the locations of the real data. And now at this point, it's going to be very very hard for the discriminator that's going to apply its classifier to these instances and do a good job of delineating real versus fake because those generated instances are so close to the real data.

So really this is the core intuition be behind how these two components of a GAN are essentially competing with each other. Bringing back to our diagram, we can summarize this. A generator is just trying to create new data instances to fool the discriminator.

And the discri discriminator is trying to identify these instances through this uh competing objective between between the generator. And ideally we reach a an optimum where the generator is able to perfectly reproduce the true data distribution and completely fool the discriminator's ability to classify. So this is really really kind of a fun and interesting idea that initially led to a lot of excitement about generative models.

And fundamentally what I want you all to take away is that you can think about these uh generative models like GANs as learning a distribution transformation going from the ability to move from pure Gausian noise random noise to eventually be able to produce instances across a learned target distribution when the train generator of a GAN is synthesizing new instances. What it has done is use its learned mapping from that distribution of Gausian noise to the target data distribution to uh learn this mapping from one point somewhere in that noise distribution to a particular output in the target data space. If we consider some other point some in another initial point in that initial distribution of noise and fed it through the generator now it would produce a new instance somewhere else in the target data distribution somewhere else on the data manifold.

And what we can actually do is smoothly interpolate and try to traverse in the space of completely random noise. And it turns out it yields actual smooth interpol interpolations in this learned data manifold. And as you can see right the results are kind of striking where we can see very concretely a transformation in series that reflects this sort of interpretable trans uh traversal or transversal of the data space.

And so this is really a powerful concept that gets at this notion of distribution transformation. We can do something as seemingly wild as going from a distribution of complete random noise to some target. We can also do things like going from one source that's not random to another target source.

And there have been variations on GANs such as cycle GANs that propose exactly this. Instead of starting from gausian noise and moving to a target data manifold, we can consider one data manifold and try to learn a mapping that takes us to another data manifold. This is really cool because it leads to some clear capabilities in things like style transformation.

So in images for example, we can do completely unpaired translation of images from one style to another using this architecture called a cycle GAN. So this visualization is showing you uh a cycle GAN that's trained on natural images. And here we're seeing a domain transformation from a domain of horse to a domain of zebra.

And this is a concept of effectively style transfer. And how cycle GANs work is that they employ this kind of cyclic loss where you actually have two sets of generator and discriminator one for each of the domains and an additional code dependency that links those two domains together. In fact, we ourselves just a few years back built out a cycle GAN to uh do not uh to do something that we introduced yesterday in the beginning lecture which is speech and audio transformation.

It turns out that you can actually represent audio waveforms as images called spectrograms. And very similar to how we saw in the zebra horse example, what we did uh five years back is trained a cycle gan to transform speech spectrograms from one style to another. And if you recall, Alexander showed the snippet of the video from Obama introducing the the course that we used a few years ago.

And as as you saw, right, that video was synthetic. Unfortunately, Obama did not actually come and grace us with his presence to introduce the class. What we did instead was used audio recordings, several hours of audio recordings to now convert those audio recordings into spectrogram images for both source voice and target voice and train the cycle GAN to do this unpaired translation from one voice domain to another.

Yes. function that you use. Yeah.

So, let me pull actually I have it right here. That's a great question. The core principle is we want to put the discriminator and the generator and optimize them jointly.

Right? So, I can actually pull up the exact loss function. So the loss term is based on the same intuition of comparing the real data with the generated data instances.

First, if we consider the loss term from the perspective of the discriminator, what it's seeking to do is maximizing the probability that data is correctly identified as real, correctly identified as fake. And as you can see right that involves a discrimination over the generator's output here G as well as a discrimination over the real data X fake versus real and again this is focusing on the discriminator. The generator on the other hand is trying to now minimize uh the minimize the probability that the generator that its generated data is identified as fake.

So now it's that same objective but all it's doing is the converse. We're trying to minimize the probability that fake data is um classified as fake. We want to be as close to moving it to the real data as possible.

And in concert we can then put this together. We have uh one one term here that is jointly maximized by the discriminator and minimized by the generator. So it's this dual this kind of um competing objective where again the core intuition is we want to optimize the discriminator to do as well as possible in delineating real versus fake and optimize the generator to produce as synthetic and realistic instances as possible.

Yes, that's a great question. It is differentiable, but it turns out in practice it's very it can be very difficult to train GANs stably. And so there have been works proposed that give variations on this loss or introduce other methods to try to uh improve the stability.

Awesome. So one second. So so to summarize and maybe conclude right with GANs first right the core principle is this competition between these two networks that can be optimized and trained jointly through this objective that we just saw.

And this is in contrast with a more density estimation focused approach like a VAE where our goal is really to learn underlying latent variables that capture the core features present in the data um that give us a lower dimensional representation of our data examples. Now, these are really foundational approaches to generative modeling, and they've paved the way for great advances that have come in the past couple of years that have yielded very, very high fidelity and high quality generations across a number of domains. And we're going to see in tomorrow's lectures some of these very cutting edge and new frontier advances in generative modeling including models like diffusion models, language models that are the driving tools behind the advances in generative AI we see today.

So I'll end by giving you a snap uh preview that we're we have much more to come in lectures six and 10 on diffusion models for different domains. Hopefully, this gets you excited about the promise and the power of generative modeling and you'll have hands-on experience in today's software lab that bridges the themes of computer vision with the themes of VAEEs and generative models where you actually train uh CNN based VAE to do facial detection and debiasing of facial detection systems. So with that we will conclude and links uh to download and get started with the labs are available um online.

Before we we close and break to the office hours, I want um I believe we have our TAs who have joined us uh today. So if you can raise your hands or come down to the front um that would be great so everyone in the class can recognize you and identify you for help on the software labs. Yes question.

Yes. Piaza is the place for um posting questions asynchronously. Oh, Victory is Yeah.

So, we're going to break into the office hours session. Victory is one of our amazing TAs. Um she will be able to help answer questions related to TensorFlow, PyTorch, and the labs.

And of course, if you have questions on the content itself for the lectures, uh Alexander and I will be here as well. Thank you.