How does Ray Tracing Work in Video Games and Movies?

773.45k views4249 WordsCopy TextShare
Branch Education
Go to http://brilliant.org/BranchEducation/ for a 30-day free trial and expand your knowledge. Use t...
Video Transcript:
Every new TV show and movie that uses  computer-generated images and special effects relies on Ray Tracing. For example,  in order to build an interstellar battle, set in a galaxy far, far away, 3D  artists model and texture the spaceships, position them around the scene with lights, a  background, and a camera, and then render the scene. Rendering is a computational process that  simulates how rays of light bounce off of and illuminate each of the models, thus transforming  a scene full of simple 3D models into a realistic environment.
There are many different ray  tracing algorithms used to render scenes, but the current industry standard in TV  shows and movies is called path tracing. This algorithm requires an unimaginable number of  calculations. For example, if you had the entire population of the world working together  and performing 1 calculation every second, it would take 12 days of nonstop problem solving  to turn this scene into this image.
Due to these incredible computational requirements, path  tracing was considered impossible for anything but super computers for decades. In fact,  this algorithm for simulating light was first conceptualized in 1986, however it took 30 years  before movies like Zootopia, Moana, Finding Dory and Coco could be rendered using path tracing  and even then, rendering these movies required a server farm of 1000s of computers and multiple  months to complete. So, why does path tracing require quadrillions of calculations?
And how  does Ray Tracing work? Well, in this video, we’ll answer these two questions, and in the process,  you’ll get a better understanding of how Computer Generated Images or CGI and special effects are  created for TV and movies. After that we’ll open up this GPU and see how its architecture is  specifically designed to execute ray tracing, enabling it to render this scene in only a few  minutes.
And finally, we’ll investigate how Video games like Cyberpunk or the Unreal Engine Lumen  Renderer use Ray Tracing. So, let’s dive right in. This video is sponsored by Brilliant.
org Let’s first see how Path Tracing works and how this dragon and kingdom are created and  turned into a setting for a fantasy show. To make the scene, an artist first spends a few months  modeling everything, the islands, the castles, the houses, the trees, and of course, the dragon.  Although these models may have some smooth curves or squares and polygons, they’re actually all  broken down into small triangles.
In short, GPUs almost exclusively work with 3D  scenes made of triangles, and this scene is built from 3. 2 million triangles. After a model is built, the 3D artist assigns a texture to it which defines both the color,  as well as material attributes, such as whether the surface is rough, smooth, metallic, glass,  water-like, or composed of a wide range of other materials.
Next, the completed models are properly  positioned around the scene and the artist adds lights such as the sky and the sun and adjusts  their intensity and direction to simulate the time of day. Finally, a virtual camera is added  and the scene is rendered and brought to life. As mentioned earlier, path tracing simulates  how light interacts with and bounces off every surface in the scene, thereby producing  realistic effects such as smooth shadows across the buildings or the way light interacts  with the water and produces bright highlights in some areas and water covered sand in others.
In the real world, light rays start at the sun, and when they hit a surface such as this red roof,  some light is absorbed while the red light is reflected, thus tinting the light based on the  color of the object. These now tinted light rays bounce off the surface and make their  way to the camera and produce a 2D image. With this scene, a near infinite number  of light rays are produced by the sun and sky and only a small fraction of them  actually reach the camera.
Calculating an infinite number of light rays is impossible  and only the light rays that reach the camera are useful, and therefore with path tracing we  don’t send rays out from the sky or light source, but rather we send out rays from a virtual  camera and into the scene. We then determine which objects the rays hit and calculate how those  objects are illuminated by the light sources. With computer-generated images or CGI, the 2D  image is represented by a view plane in front of the virtual camera.
This view plane has the  same pixel count as the final image, so a 4K image has 8. 3 million pixels. Furthermore, by animating  the camera around or changing its field of view, the view plane will correspondingly change.
Let’s transition to an indoor scene such as this barbershop, which contains 8 million  triangles and is actually more complicated than the island kingdom. In order to create this image  on the view plane, a total of 8. 3 billion rays, which are a thousand rays per pixel, are sent out  from the virtual camera through the view plane and into the scene.
Ray Tracing is a massively  parallel operation because each pixel is independent from all other pixels. This means that  the thousand rays from one pixel can be calculated at the same time as the rays from the next pixel  over and so on. Once a single pixel’s rays finish flying around the scene, the results are combined  with the other rays and pixels to form a single image.
If we were to show billions of rays, the  scene would quickly become inundated with lines, so let’s simplify it down to a single ray  running through one pixel of the viewing plane. This ray starts at the camera, travels through a  random point in the pixel and into the scene. It flies straight and eventually hits a triangle,  and once it does, that object’s color becomes associated with that ray and pixel.
For example,  when the ray hits this chair, then the pixel becomes red. The other nearby rays running through  random places in the same pixel will hit pretty close to this ray and have their colors averaged  together. These rays are called primary rays and they answer the question of what triangle and  object do the rays first hit and what basic color should be in that specific pixel.
Another example  is that these rays running through this pixel hit the blue stripe on the barbershop pole turning the  pixel blue. The other billions of rays do the same thing resulting in a single image with the proper  3D perspective from the virtual camera. This image is fairly flat colored because each pixel just  has the simple color of the object the rays hit.
So the next question is: how is the location where  the primary ray hits illuminated by the light sources and how bright or dark should the pixel’s  color be shaded. For example, when you look at the blue stripe of the barbershop pole, the entire  stripe is just blue, but in the rendered image, there’s a gradient from bright to dark across  a number of pixels depending on how the triangles are facing the lights and the window.  Specifically, the dark blue backside doesn’t face any of the light sources and therefore its  illumination comes only from light bouncing off the nearby walls.
Furthermore, when the lighting  conditions change and more light enters the scene, the entire barbershop pole brightens up. This  accurate lighting applies to all the objects in the scene and is what transforms the scene and  makes it look realistic. In order to accurately determine the brightness of these blue pixels, ray  tracing first needs to determine how the surface is illuminated directly by the light sources,  which is called direct illumination, and second, how the surface is illuminated by light bouncing  off other objects, which is called indirect illumination.
Combining direct and indirect  illumination is called global illumination. In order to calculate direct illumination, we  start at the intersection point where the primary ray hits the triangle in the barbershop pole and  then we generate additional rays called shadow rays and send them in the direction of each light  source such as the light bulbs or the sun outside the window. If there are no objects between  the intersection point and a light source, then that means that this point on the blue stripe  is directly illuminated by that light source.
For each light source that directly illuminates this  point, we factor in the light source’s brightness, size, color, distance, and the direction of the  surface that the triangle inside the blue stripe is facing. All these factors are multiplied by  the Red, Blue, and Green or RGB values of the blue stripe, which in turn changes the shading or  brightness of the pixel that the primary ray went through. Let’s brighten the room again, and you  can see the RGB values increase for this pixel.
Now let’s dim the room once more and look at  a different pixel whose primary ray hits the backside of the barbershop pole. A similar set of  shadow rays are sent out from this intersection point to each light source, but each of these  rays is blocked by other triangles in the pole, and thus this point doesn’t receive any direct  illumination from any of the light sources, leaving the pixel dark. These rays are  called shadow rays because they determine whether a location is directly illuminated by  a light source or whether it’s in a shadow.
You might think that this backside should be  entirely black because it’s in the shadows and none of the light rays from the light sources can  reach it. However, this backside still has color because it’s illuminated by light bouncing off the  walls. This light is called indirect illumination, and in order to calculate it, we take the  intersection point from the primary ray and generate a secondary ray that bounces off it.
This  secondary ray then hits a new surface such as this point on the wall. From this secondary point we  send out a new set of shadow rays to each light source to see whether the point on the wall is  in shadows or whether it’s directly illuminated. The results from these new shadow rays and the  attributes of the corresponding light sources are combined with the color of the wall’s surface,  essentially turning this point on the wall into a light source that illuminates the backside of  the barbershop pole.
Sometimes this point is still in shadows, so we create an additional  secondary ray from the point on the wall and send it in a new direction and see what it hits.  Then we calculate how that third point is directly illuminated using yet another set of shadow rays  thereby turning this third point into a light source that illuminates the previous point. This  secondary ray bouncing happens multiple times, and each time we send shadow rays to the light  sources and check how that point is illuminated.
The purpose of bouncing the secondary rays  around and sending out shadow rays at each point is to find different paths where  light bounces off different surfaces and indirectly illuminates the original point  where the primary ray hits. Furthermore, by sending a thousand rays through random points  in a single pixel, and by having thousands of secondary rays bounce in different directions,  we get an accurate approximation for indirect illumination or how this pixel is illuminated  by light bouncing off the other objects. It's called path tracing because by using these  primary rays, secondary rays and shadow rays, we’re finding billions of paths from the camera  through different points in the scene and to the light sources.
One additional benefit of indirect  illumination and the use of secondary rays is that color can bounce from one object to another.  For example, when we place a red balloon next to the wall and brighten the scene, some secondary  light rays are tinted red by the balloon, and this reddish color can be seen on the wall itself. An important detail is that the direction the secondary rays bounce off the surface depends  on the material and texture properties assigned to the object.
For example, here is  a set of spheres that are all gray, but have different roughness values that  drastically change their look. Essentially, for a perfectly smooth surface with no roughness, the  object becomes a mirror because every one of the secondary rays will bounce off in the same perfect  reflection direction, and whatever the secondary rays hit will combine together and become  visible in the mirror-like surface. However, when a material has a roughness set to 100%, then  the secondary rays will bounce in entirely random directions resulting in a flat gray surface.
Furthermore, if an object is assigned a glass material, then additional refraction rays  that pass through the glass are generated, and the color and brightness of the pixels in the  glass will depend mostly on the direction of the refraction rays and what those rays hit. Here’s an  interesting scene of some glass and mirror objects that truly show the power of path tracing, and  you can see multiple mirror bounces in some of the objects and proper refraction in the glass. Note that for this barbershop scene a thousand rays per pixel and four secondary bounces are the  render settings we chose during scene setup.
Other scenes use different numbers of rays per pixel,  secondary bounces, and light sources. When we multiply these values together with the number of  pixels in an image we get the total number of rays required to generate a single image. Furthermore,  animations typically have 24 frames a second, so a 20-minute animation requires over a  quadrillion rays, and that’s why path tracing was considered computationally impossible for  TV shows and movies for decades.
The other key problem was figuring out which one triangle  out of 8 million each of the rays hits first. So let’s see how these problems are solved and  we’ll start by transitioning to a new scene and see how ray-triangle intersections are calculated.  Let’s simplify the scene down to one ray and two triangles and find which one the ray hits.
We  start by extending the planes that the triangles are on and then, using the equations of the planes  and the ray, we calculate the point at which they intersect. Now that we have a set of intersection  points on separate planes, we find whether the point is inside each corresponding triangle. If  it is, then that means the ray hits the triangle, and if it isn’t that means it misses the  triangle.
These steps are relatively simple, and with 10 triangles, we can do this over  and over, once for each triangle. If multiple triangles are hit we do a distance calculation to  find the closest one. However, when a scene has millions of triangles, finding which one triangle  a single ray hits first becomes incredibly repetitive and computationally problematic.
We solve this by using what’s called a bounding volume hierarchy or BVH. Essentially, we take  triangles in the scene and, using their 3D coordinates, we divide them into two separate  boxes called bounding volumes. Each of these boxes contains half of all the triangles in the  scene.
Then we take these 2 boxes with their 1. 5 million triangles and divide them again into  boxes with 750,000 triangles. We keep on dividing the triangles into more and more progressively  smaller pairs of boxes for a total of 19 divides.
In the end we’ve separated 3 million triangles  into a hierarchy of 19 divisions of boxes with a total of 525 thousand very small boxes at the  bottom, each with around 6 triangles inside. The key is that all of these boxes have  their sides aligned with the coordinate axes, which makes a far easier calculation. For example  e, if we have a ray and two axes aligned boxes, finding whether it hits box A or box B is just  a matter of finding the intercept with the plane of Y equals six, and then seeing whether the  intercept coordinates fall between box A’s bounds or between Box B’s bounds.
Then we do the  same thing inside Box B but using the axes aligned coordinates of the two smaller boxes inside of it. For a scene of 3 million triangles, these 19 box divide branches form a binary tree or hierarchy,  hence the name bounding volume hierarchy. At each branch we perform a simple ray-box intersection  calculation to see which box the ray hits first, and then the ray travels to the next branch.
At  the very bottom, once a ray finishes traveling through all the bounding volume branches,  which is called BVH traversal, we end up with a small box of only 6 triangles. We then do  the ray-triangle intersection calculation that we mentioned earlier with just these 6 triangles. As a result, BVH trees and traversal reduce tens of millions of calculations down to  a handful of simple ray box intersections followed by 6 ray triangle intersections.
Using BVHs helps to solve which triangle a ray will hit first but doesn’t fix the fact  that a single frame of animation requires over a hundred billion rays. The solution is in the  incredibly powerful GPUs we now have. When we open up this GPU, we find a rather large  microchip that has 10496 CUDA or shading cores and 82 Ray Tracing or RT cores.
The  CUDA cores perform basic arithmetic while the ray tracing cores are specially designed and  optimized to execute Ray Tracing. Inside the RT cores are two sections, the BVH traversal section  takes in all the coordinates of the boxes and the direction of the ray and executes BVH traversal in  nanoseconds. Then, the ray triangle intersection section uses the coordinates of the six or so  triangles in the smallest bounding volume and quickly finds which triangle the ray hits first. 
The RT cores operate in parallel with one another and pipeline the operations so that a few billion  rays can be handled every second, and a complex scene like this one can be rendered in 4 minutes. Overall Path Tracing’s computationally impossible problems are solved by using bounding volume  hierarchies along with improvements in GPU hardware. One crazy fact is that the most powerful  supercomputer in the year 2000 was the ASCI White, which cost 110 million dollars and could perform  12.
3 trillion operations a second. Compare this with the NVidia 3090 GPU which cost a few  thousand dollars when it first came out in 2022 and the CUDA or shading cores perform 36 trillion  operations a second. It’s mind-boggling how such an incredible amount of computing power can fit  into a graphics card the size of a shoebox and how computer-generated images or CGI and special  effects, which used to be only for high-budget films, can now be created on a desktop computer.
Ray Tracing is a fusion of a variety of different disciplines from the physics of light, to  trigonometry, vectors, and matrices, and then also computer science, algorithms and hardware.  Covering all these topics would require multiple hour-long videos which we don’t have time to do,  but luckily Brilliant, the sponsor of this video, already has several free and easy to access  courses that explore these topics. Brilliant is where you learn by doing, and is a website filled  with thousands of fun and interactive modules, loaded with subjects ranging from the fundamentals  of math to quantum mechanics to programming in python to biology, and much more.
When I learn new  things on Brilliant, I like to think about Steve Jobs, and how he took a calligraphy class  at college. Although at the time it had no practical application in his life, 10 years later  when designing the Macintosh computer, he applied all the lessons from that calligraphy course  to designing the typefaces and proportionally spaced fonts of the Mac. The key is that as  you progress through Brilliant’s interactive lessons and learn new things, you may not know  how those lessons apply to your job or life, but there will be one or two courses that will  click into place and change the trajectory of your career.
However, if you don’t try out their  courses, then you’ll never know. The other reason why Steve Jobs is applicable to ray tracing  is because he was the CEO of Pixar from 1986 until 2006 and helped to design the computers  that rendered some of its first movies. To be a successful inventor like Steve Jobs, you need  to be well versed in a wide range of disciplines.
For the viewers of this channel, Brilliant  is providing a free 30-day trial with access to all their thousands of lessons and is also  offering 20% off an annual subscription. Just go to brilliant. org/brancheducation. 
The link is in the description below. We loved making this video because path  tracing is an algorithm that we use daily due to the fact that all our animations are  created and rendered using a software called Blender which uses path tracing in its  rendering engine. Specifically, here are all the scenes we used and some statistics  that you can pause the video and look at.
It takes a ton of work to create high quality  educational videos. Researching this video, writing the script, and then animating  the scenes has taken us over 800 hours, so if you could take a quick second to  like this video, subscribe to the channel, write a comment below and share it with someone  who watches TV or movies it would help us a ton. Furthermore, we’d like to give a shout-out to  the Blender Dev Team.
Blender is an incredibly powerful, free-to-use modeling and animation  software. Each of these scenes was made by an incredible artist and you can download  them for free from the Blender website. Finally, one question you may have is:  how is ray tracing is used in video games.
There are many different methods, so we’ll cover  just a few of them. The first one is similar to path tracing but with some shortcuts.  For a given environment in a video game, a very low-resolution duplicate of all the models  in the scene is created.
Path tracing is then used to determine direct and indirect lighting for each  of these low-resolution objects and the results are saved into a light map on the low-resolution  duplicate. Then the light map is applied to the high-resolution version of the objects in the  scene, creating realistic indirect lighting and shadows on the high-resolution objects.  This method is pretty good at approximating indirect lighting and is one of the ray tracing  techniques used in Unreal Engine’s Lumen renderer.
The second and completely different method  for using ray tracing in video games is called screen space ray tracing. It doesn’t use the  scene’s geometries but rather uses the images and data generated from the video game graphics  rendering pipeline where all the objects in the scene undergo 3D transformations to build a  flat 2D image on the viewscreen. During the video game graphics process, additional data  is created, such as a depth map that shows how far each object and the corresponding pixels  are from the camera, as well as a normal map that shows the direction each of the objects and  pixels are facing.
By combining the view screen, the depth map, and the normal map, we can generate  an approximation for the X, Y, and Z values of the various objects in the scene, as well as  determine what direction each pixel is facing. Now that we have a simplified scene, let’s say  this lake is reflective, and we want to know what pixels should be shown in its reflection.  To figure it out, we use ray tracing with this simplified screen space 3D representation and  bounce the rays off of the lake’s pixels using the normal map.
These rays then continue through the  simplified geometry and hit the trees behind it, thus producing a reflection of the trees on  the lake. One problematic issue with screen space ray tracing is that it can only use  the data that’s on the screen. As a result, when the camera moves, the trees move out of  view, and thus the trees are removed from the screen space data and it’s impossible to  see them in the reflection.
Additionally, screen space ray tracing doesn’t allow for  reflections of objects behind the camera. This type of ray tracing along with other rendering  algorithms are used in games like Cyberpunk. Additionally, if you’re curious as  to how video game graphics work, we have a separate video that explores  all the steps such as Vertex Shading, Rasterization, and Fragment Shading.
The  video game graphics rendering pipeline is entirely different from Ray Tracing,  so we recommend you check it out. And, that’s pretty much it for Ray Tracing. We’d like to give a shoutout to Cem Yuksel, a professor at the School of Computing at the  University of Utah.
On his YouTube channel, you can find his lecture series on computer  graphics and interactive graphics, which were both instrumental in the research for this video. This is Branch Education, and we create 3D animations that dive deeply into the  technology that drives our modern world. Watch another Branch video by clicking one  of these cards or click here to subscribe.
Copyright © 2025. Made with ♥ in London by YTScribe.com