How does Ray Tracing Work in Video Games and Movies?

773.45k views4249 WordsCopy TextShare

Branch Education

Go to http://brilliant.org/BranchEducation/ for a 30-day free trial and expand your knowledge. Use t...

Video Transcript:

Every new TV show and movie that uses computer-generated images and special effects relies on Ray Tracing. For example, in order to build an interstellar battle, set in a galaxy far, far away, 3D artists model and texture the spaceships, position them around the scene with lights, a background, and a camera, and then render the scene. Rendering is a computational process that simulates how rays of light bounce off of and illuminate each of the models, thus transforming a scene full of simple 3D models into a realistic environment.

There are many different ray tracing algorithms used to render scenes, but the current industry standard in TV shows and movies is called path tracing. This algorithm requires an unimaginable number of calculations. For example, if you had the entire population of the world working together and performing 1 calculation every second, it would take 12 days of nonstop problem solving to turn this scene into this image.

Due to these incredible computational requirements, path tracing was considered impossible for anything but super computers for decades. In fact, this algorithm for simulating light was first conceptualized in 1986, however it took 30 years before movies like Zootopia, Moana, Finding Dory and Coco could be rendered using path tracing and even then, rendering these movies required a server farm of 1000s of computers and multiple months to complete. So, why does path tracing require quadrillions of calculations?

And how does Ray Tracing work? Well, in this video, we’ll answer these two questions, and in the process, you’ll get a better understanding of how Computer Generated Images or CGI and special effects are created for TV and movies. After that we’ll open up this GPU and see how its architecture is specifically designed to execute ray tracing, enabling it to render this scene in only a few minutes.

And finally, we’ll investigate how Video games like Cyberpunk or the Unreal Engine Lumen Renderer use Ray Tracing. So, let’s dive right in. This video is sponsored by Brilliant.

org Let’s first see how Path Tracing works and how this dragon and kingdom are created and turned into a setting for a fantasy show. To make the scene, an artist first spends a few months modeling everything, the islands, the castles, the houses, the trees, and of course, the dragon. Although these models may have some smooth curves or squares and polygons, they’re actually all broken down into small triangles.

In short, GPUs almost exclusively work with 3D scenes made of triangles, and this scene is built from 3. 2 million triangles. After a model is built, the 3D artist assigns a texture to it which defines both the color, as well as material attributes, such as whether the surface is rough, smooth, metallic, glass, water-like, or composed of a wide range of other materials.

Next, the completed models are properly positioned around the scene and the artist adds lights such as the sky and the sun and adjusts their intensity and direction to simulate the time of day. Finally, a virtual camera is added and the scene is rendered and brought to life. As mentioned earlier, path tracing simulates how light interacts with and bounces off every surface in the scene, thereby producing realistic effects such as smooth shadows across the buildings or the way light interacts with the water and produces bright highlights in some areas and water covered sand in others.

In the real world, light rays start at the sun, and when they hit a surface such as this red roof, some light is absorbed while the red light is reflected, thus tinting the light based on the color of the object. These now tinted light rays bounce off the surface and make their way to the camera and produce a 2D image. With this scene, a near infinite number of light rays are produced by the sun and sky and only a small fraction of them actually reach the camera.

Calculating an infinite number of light rays is impossible and only the light rays that reach the camera are useful, and therefore with path tracing we don’t send rays out from the sky or light source, but rather we send out rays from a virtual camera and into the scene. We then determine which objects the rays hit and calculate how those objects are illuminated by the light sources. With computer-generated images or CGI, the 2D image is represented by a view plane in front of the virtual camera.

This view plane has the same pixel count as the final image, so a 4K image has 8. 3 million pixels. Furthermore, by animating the camera around or changing its field of view, the view plane will correspondingly change.

Let’s transition to an indoor scene such as this barbershop, which contains 8 million triangles and is actually more complicated than the island kingdom. In order to create this image on the view plane, a total of 8. 3 billion rays, which are a thousand rays per pixel, are sent out from the virtual camera through the view plane and into the scene.

Ray Tracing is a massively parallel operation because each pixel is independent from all other pixels. This means that the thousand rays from one pixel can be calculated at the same time as the rays from the next pixel over and so on. Once a single pixel’s rays finish flying around the scene, the results are combined with the other rays and pixels to form a single image.

If we were to show billions of rays, the scene would quickly become inundated with lines, so let’s simplify it down to a single ray running through one pixel of the viewing plane. This ray starts at the camera, travels through a random point in the pixel and into the scene. It flies straight and eventually hits a triangle, and once it does, that object’s color becomes associated with that ray and pixel.

For example, when the ray hits this chair, then the pixel becomes red. The other nearby rays running through random places in the same pixel will hit pretty close to this ray and have their colors averaged together. These rays are called primary rays and they answer the question of what triangle and object do the rays first hit and what basic color should be in that specific pixel.

Another example is that these rays running through this pixel hit the blue stripe on the barbershop pole turning the pixel blue. The other billions of rays do the same thing resulting in a single image with the proper 3D perspective from the virtual camera. This image is fairly flat colored because each pixel just has the simple color of the object the rays hit.

So the next question is: how is the location where the primary ray hits illuminated by the light sources and how bright or dark should the pixel’s color be shaded. For example, when you look at the blue stripe of the barbershop pole, the entire stripe is just blue, but in the rendered image, there’s a gradient from bright to dark across a number of pixels depending on how the triangles are facing the lights and the window. Specifically, the dark blue backside doesn’t face any of the light sources and therefore its illumination comes only from light bouncing off the nearby walls.

Furthermore, when the lighting conditions change and more light enters the scene, the entire barbershop pole brightens up. This accurate lighting applies to all the objects in the scene and is what transforms the scene and makes it look realistic. In order to accurately determine the brightness of these blue pixels, ray tracing first needs to determine how the surface is illuminated directly by the light sources, which is called direct illumination, and second, how the surface is illuminated by light bouncing off other objects, which is called indirect illumination.

Combining direct and indirect illumination is called global illumination. In order to calculate direct illumination, we start at the intersection point where the primary ray hits the triangle in the barbershop pole and then we generate additional rays called shadow rays and send them in the direction of each light source such as the light bulbs or the sun outside the window. If there are no objects between the intersection point and a light source, then that means that this point on the blue stripe is directly illuminated by that light source.

For each light source that directly illuminates this point, we factor in the light source’s brightness, size, color, distance, and the direction of the surface that the triangle inside the blue stripe is facing. All these factors are multiplied by the Red, Blue, and Green or RGB values of the blue stripe, which in turn changes the shading or brightness of the pixel that the primary ray went through. Let’s brighten the room again, and you can see the RGB values increase for this pixel.

Now let’s dim the room once more and look at a different pixel whose primary ray hits the backside of the barbershop pole. A similar set of shadow rays are sent out from this intersection point to each light source, but each of these rays is blocked by other triangles in the pole, and thus this point doesn’t receive any direct illumination from any of the light sources, leaving the pixel dark. These rays are called shadow rays because they determine whether a location is directly illuminated by a light source or whether it’s in a shadow.

You might think that this backside should be entirely black because it’s in the shadows and none of the light rays from the light sources can reach it. However, this backside still has color because it’s illuminated by light bouncing off the walls. This light is called indirect illumination, and in order to calculate it, we take the intersection point from the primary ray and generate a secondary ray that bounces off it.

This secondary ray then hits a new surface such as this point on the wall. From this secondary point we send out a new set of shadow rays to each light source to see whether the point on the wall is in shadows or whether it’s directly illuminated. The results from these new shadow rays and the attributes of the corresponding light sources are combined with the color of the wall’s surface, essentially turning this point on the wall into a light source that illuminates the backside of the barbershop pole.

Sometimes this point is still in shadows, so we create an additional secondary ray from the point on the wall and send it in a new direction and see what it hits. Then we calculate how that third point is directly illuminated using yet another set of shadow rays thereby turning this third point into a light source that illuminates the previous point. This secondary ray bouncing happens multiple times, and each time we send shadow rays to the light sources and check how that point is illuminated.

The purpose of bouncing the secondary rays around and sending out shadow rays at each point is to find different paths where light bounces off different surfaces and indirectly illuminates the original point where the primary ray hits. Furthermore, by sending a thousand rays through random points in a single pixel, and by having thousands of secondary rays bounce in different directions, we get an accurate approximation for indirect illumination or how this pixel is illuminated by light bouncing off the other objects. It's called path tracing because by using these primary rays, secondary rays and shadow rays, we’re finding billions of paths from the camera through different points in the scene and to the light sources.

One additional benefit of indirect illumination and the use of secondary rays is that color can bounce from one object to another. For example, when we place a red balloon next to the wall and brighten the scene, some secondary light rays are tinted red by the balloon, and this reddish color can be seen on the wall itself. An important detail is that the direction the secondary rays bounce off the surface depends on the material and texture properties assigned to the object.

For example, here is a set of spheres that are all gray, but have different roughness values that drastically change their look. Essentially, for a perfectly smooth surface with no roughness, the object becomes a mirror because every one of the secondary rays will bounce off in the same perfect reflection direction, and whatever the secondary rays hit will combine together and become visible in the mirror-like surface. However, when a material has a roughness set to 100%, then the secondary rays will bounce in entirely random directions resulting in a flat gray surface.

Furthermore, if an object is assigned a glass material, then additional refraction rays that pass through the glass are generated, and the color and brightness of the pixels in the glass will depend mostly on the direction of the refraction rays and what those rays hit. Here’s an interesting scene of some glass and mirror objects that truly show the power of path tracing, and you can see multiple mirror bounces in some of the objects and proper refraction in the glass. Note that for this barbershop scene a thousand rays per pixel and four secondary bounces are the render settings we chose during scene setup.

Other scenes use different numbers of rays per pixel, secondary bounces, and light sources. When we multiply these values together with the number of pixels in an image we get the total number of rays required to generate a single image. Furthermore, animations typically have 24 frames a second, so a 20-minute animation requires over a quadrillion rays, and that’s why path tracing was considered computationally impossible for TV shows and movies for decades.

The other key problem was figuring out which one triangle out of 8 million each of the rays hits first. So let’s see how these problems are solved and we’ll start by transitioning to a new scene and see how ray-triangle intersections are calculated. Let’s simplify the scene down to one ray and two triangles and find which one the ray hits.

We start by extending the planes that the triangles are on and then, using the equations of the planes and the ray, we calculate the point at which they intersect. Now that we have a set of intersection points on separate planes, we find whether the point is inside each corresponding triangle. If it is, then that means the ray hits the triangle, and if it isn’t that means it misses the triangle.

These steps are relatively simple, and with 10 triangles, we can do this over and over, once for each triangle. If multiple triangles are hit we do a distance calculation to find the closest one. However, when a scene has millions of triangles, finding which one triangle a single ray hits first becomes incredibly repetitive and computationally problematic.

We solve this by using what’s called a bounding volume hierarchy or BVH. Essentially, we take triangles in the scene and, using their 3D coordinates, we divide them into two separate boxes called bounding volumes. Each of these boxes contains half of all the triangles in the scene.

Then we take these 2 boxes with their 1. 5 million triangles and divide them again into boxes with 750,000 triangles. We keep on dividing the triangles into more and more progressively smaller pairs of boxes for a total of 19 divides.

In the end we’ve separated 3 million triangles into a hierarchy of 19 divisions of boxes with a total of 525 thousand very small boxes at the bottom, each with around 6 triangles inside. The key is that all of these boxes have their sides aligned with the coordinate axes, which makes a far easier calculation. For example e, if we have a ray and two axes aligned boxes, finding whether it hits box A or box B is just a matter of finding the intercept with the plane of Y equals six, and then seeing whether the intercept coordinates fall between box A’s bounds or between Box B’s bounds.

Then we do the same thing inside Box B but using the axes aligned coordinates of the two smaller boxes inside of it. For a scene of 3 million triangles, these 19 box divide branches form a binary tree or hierarchy, hence the name bounding volume hierarchy. At each branch we perform a simple ray-box intersection calculation to see which box the ray hits first, and then the ray travels to the next branch.

At the very bottom, once a ray finishes traveling through all the bounding volume branches, which is called BVH traversal, we end up with a small box of only 6 triangles. We then do the ray-triangle intersection calculation that we mentioned earlier with just these 6 triangles. As a result, BVH trees and traversal reduce tens of millions of calculations down to a handful of simple ray box intersections followed by 6 ray triangle intersections.

Using BVHs helps to solve which triangle a ray will hit first but doesn’t fix the fact that a single frame of animation requires over a hundred billion rays. The solution is in the incredibly powerful GPUs we now have. When we open up this GPU, we find a rather large microchip that has 10496 CUDA or shading cores and 82 Ray Tracing or RT cores.

The CUDA cores perform basic arithmetic while the ray tracing cores are specially designed and optimized to execute Ray Tracing. Inside the RT cores are two sections, the BVH traversal section takes in all the coordinates of the boxes and the direction of the ray and executes BVH traversal in nanoseconds. Then, the ray triangle intersection section uses the coordinates of the six or so triangles in the smallest bounding volume and quickly finds which triangle the ray hits first.

The RT cores operate in parallel with one another and pipeline the operations so that a few billion rays can be handled every second, and a complex scene like this one can be rendered in 4 minutes. Overall Path Tracing’s computationally impossible problems are solved by using bounding volume hierarchies along with improvements in GPU hardware. One crazy fact is that the most powerful supercomputer in the year 2000 was the ASCI White, which cost 110 million dollars and could perform 12.

3 trillion operations a second. Compare this with the NVidia 3090 GPU which cost a few thousand dollars when it first came out in 2022 and the CUDA or shading cores perform 36 trillion operations a second. It’s mind-boggling how such an incredible amount of computing power can fit into a graphics card the size of a shoebox and how computer-generated images or CGI and special effects, which used to be only for high-budget films, can now be created on a desktop computer.

Ray Tracing is a fusion of a variety of different disciplines from the physics of light, to trigonometry, vectors, and matrices, and then also computer science, algorithms and hardware. Covering all these topics would require multiple hour-long videos which we don’t have time to do, but luckily Brilliant, the sponsor of this video, already has several free and easy to access courses that explore these topics. Brilliant is where you learn by doing, and is a website filled with thousands of fun and interactive modules, loaded with subjects ranging from the fundamentals of math to quantum mechanics to programming in python to biology, and much more.

When I learn new things on Brilliant, I like to think about Steve Jobs, and how he took a calligraphy class at college. Although at the time it had no practical application in his life, 10 years later when designing the Macintosh computer, he applied all the lessons from that calligraphy course to designing the typefaces and proportionally spaced fonts of the Mac. The key is that as you progress through Brilliant’s interactive lessons and learn new things, you may not know how those lessons apply to your job or life, but there will be one or two courses that will click into place and change the trajectory of your career.

However, if you don’t try out their courses, then you’ll never know. The other reason why Steve Jobs is applicable to ray tracing is because he was the CEO of Pixar from 1986 until 2006 and helped to design the computers that rendered some of its first movies. To be a successful inventor like Steve Jobs, you need to be well versed in a wide range of disciplines.

For the viewers of this channel, Brilliant is providing a free 30-day trial with access to all their thousands of lessons and is also offering 20% off an annual subscription. Just go to brilliant. org/brancheducation.

The link is in the description below. We loved making this video because path tracing is an algorithm that we use daily due to the fact that all our animations are created and rendered using a software called Blender which uses path tracing in its rendering engine. Specifically, here are all the scenes we used and some statistics that you can pause the video and look at.

It takes a ton of work to create high quality educational videos. Researching this video, writing the script, and then animating the scenes has taken us over 800 hours, so if you could take a quick second to like this video, subscribe to the channel, write a comment below and share it with someone who watches TV or movies it would help us a ton. Furthermore, we’d like to give a shout-out to the Blender Dev Team.

Blender is an incredibly powerful, free-to-use modeling and animation software. Each of these scenes was made by an incredible artist and you can download them for free from the Blender website. Finally, one question you may have is: how is ray tracing is used in video games.

There are many different methods, so we’ll cover just a few of them. The first one is similar to path tracing but with some shortcuts. For a given environment in a video game, a very low-resolution duplicate of all the models in the scene is created.

Path tracing is then used to determine direct and indirect lighting for each of these low-resolution objects and the results are saved into a light map on the low-resolution duplicate. Then the light map is applied to the high-resolution version of the objects in the scene, creating realistic indirect lighting and shadows on the high-resolution objects. This method is pretty good at approximating indirect lighting and is one of the ray tracing techniques used in Unreal Engine’s Lumen renderer.

The second and completely different method for using ray tracing in video games is called screen space ray tracing. It doesn’t use the scene’s geometries but rather uses the images and data generated from the video game graphics rendering pipeline where all the objects in the scene undergo 3D transformations to build a flat 2D image on the viewscreen. During the video game graphics process, additional data is created, such as a depth map that shows how far each object and the corresponding pixels are from the camera, as well as a normal map that shows the direction each of the objects and pixels are facing.

By combining the view screen, the depth map, and the normal map, we can generate an approximation for the X, Y, and Z values of the various objects in the scene, as well as determine what direction each pixel is facing. Now that we have a simplified scene, let’s say this lake is reflective, and we want to know what pixels should be shown in its reflection. To figure it out, we use ray tracing with this simplified screen space 3D representation and bounce the rays off of the lake’s pixels using the normal map.

These rays then continue through the simplified geometry and hit the trees behind it, thus producing a reflection of the trees on the lake. One problematic issue with screen space ray tracing is that it can only use the data that’s on the screen. As a result, when the camera moves, the trees move out of view, and thus the trees are removed from the screen space data and it’s impossible to see them in the reflection.

Additionally, screen space ray tracing doesn’t allow for reflections of objects behind the camera. This type of ray tracing along with other rendering algorithms are used in games like Cyberpunk. Additionally, if you’re curious as to how video game graphics work, we have a separate video that explores all the steps such as Vertex Shading, Rasterization, and Fragment Shading.

The video game graphics rendering pipeline is entirely different from Ray Tracing, so we recommend you check it out. And, that’s pretty much it for Ray Tracing. We’d like to give a shoutout to Cem Yuksel, a professor at the School of Computing at the University of Utah.

On his YouTube channel, you can find his lecture series on computer graphics and interactive graphics, which were both instrumental in the research for this video. This is Branch Education, and we create 3D animations that dive deeply into the technology that drives our modern world. Watch another Branch video by clicking one of these cards or click here to subscribe.