3 ways to reduce the size of your docker images

26.36k views2140 WordsCopy TextShare

Raghav Dua

I'm also building Dockershrink - An Open source AI Agent to automatically reduce the size of Docker ...

Video Transcript:

Hello guys, this is an Raghav. And in this video, I'm going to show you top 3 ways to reduce. The size of your Docker images.

I'm going to show you how I brought a 1 GB image down. Just 7 MB. Which is like a 98% reduction.

And in some cases, not only do these techniques reduce your image size. They also reduce your image build time by more than 30%. So let's get started.

The first and the most popular technique is multi stage builds. Multi stage builds is a feature of Docker based on the idea that you don't need to include everything in your final Docker image. In fact, your final Docker image should only include things that are absolutely necessary to run your application.

So the basic idea behind multi stage builds is that you can use multiple FROM statements in your Docker file. And and with each FROM statement, you can use a different base image and each of them is a new stage in your docker image building process. In this new stage, you can selectively copy artifacts from the previous stage and leave everything else behind.

The last stage in your multi stage Docker file determines the image that gets created, it, the final image and what all it includes. so the trick is in the last stage of your Docker file, you use a very light base image and only include the artifacts that are necessary to run your application. So in a traditional build, in a traditional Dockerfile, what you will do is you will use a base image.

You will have a bunch of dependencies or you know, tools that you require in order to create a final application, right? You will use these dependencies and build tools and you use your source code. You'll also copy your source code into your Dockerfile.

And then out of using all these things, you will create a final artifact, right? But then this becomes your final docker image, which will contain your final artifact, but it'll also contain all the other stuff, which by the way, you no longer need, right? So the total size of this image would roughly turn out to be something in GBs, right?

Multi stage says that, sure, you can do all of this stuff, but in a previous stage, and then you can declare a new stage. And inside the stage, all you got to do is use a slim base image. So, something like Alpine, and then only, Copy the final artifact and that's it.

Then this is the last stage stage two in this case, and this is what will create the final docker image. And this image will be something a lot smaller, something like, you know, in order of tens of MBs. So that's where the difference comes.

So. In this Docker file, I'm building an image in which I compile my Golang application. And then run the final executable as the main command.

So when I started the container, my app starts to run. We've got nothing. That's doing.

docker build. Okay. The build is finished.

And this image is 1 GB. Let's test it out once. Cool.

It works. The container works, the application's running great. But if I create my image with this definition, My final image, which I'll be deploying everywhere on staging production, running on my local everywhere.

Right? My final image. We'll not only contain the executable that needs to run.

But also all the Go dependencies that I downloaded. Plus the Golang image, the base image. And it's underlying operating system and all its data.

And the problem is this is all dead weight. I don't need these things in my final container. So let's use multistage to get rid of all the dead weight.

So let's consider this whole thing to be our first stage, right? The build process first stage. So I'm going to call the stage as.

You can use an AS with the, FROM statement to name a particular stage. And you'll see why the naming is useful. Now we are seeing that in this build stage.

We will do all the heavy lifting of building the final artifacts. In this case, the final artifact is just the application. The executable that we need to run.

But we do all this. Building In the build stage. But notice that once the go build has run to completion and it has produced the final app executable.

After that point. The only thing we truly need in our image in our Docker image is the executable application. All the other stuff is not needed.

So let's create a new stage after the go build. And now let's take only the important stuff from the build stage into the final stage. So, this is where I use the build stages name.

I tell Docker. The copy the app created in the build stage. And paste it into the final images /app.

And now since this is the last FROM statement in my Docker file. This is therefore by definition, the last stage of my Docker file. So whatever is written.

After this is the final image. But another important thing. My final image contains a standalone executable application, right?

So I don't even need a Golang base image to start with because that base image provides me stuff like Golang, compiler and the standard library, but I don't need those things anymore. So let's move to a much. lighter.

base image. Let's do Alpine Linux. Good.

Now we're ready to build the image. Our final stage. Contains a very lightweight base image and only contains the executable application.

Let's remove everything. Cool. Nothing.

Once again, we build. And we'll give it a different label. docker build multi.

That's cool. Okay. The build is finished.

The final goserver image this time is just about 16 MB The previous one was 1 GB this is just 16 MB That's like a 98% reduction in size If I try to run it, the application still runs fine because it has everything that it needs, inside the container I make sure that my executable everything else can be excluded Another thing to note is that I used alpine as the base image of my final stage multistage builds are most impactful when the base image of your final stage is a light one you can use alpine or even google's distroless images Awesome. That was multistage. The second important technique to reduce image size is to use fewer layers in your image.

Every statement that you write in your dockerfile. Create a new layer in your image. And each layer adds to the image size as well as the build time.

In this Docker file, I installed lots of dependencies on top of an ubuntu image. But for each APT command. I use a different RUN statement.

First let's build the image. And also let's time it. I'm gonna track the.

Time taken to build the whole image. Okay. So the image took around 47 seconds to build.

Um, is 266 M B i size. No. Let's reduce the number of layers I'm going to combine all these run statements into single run statement and run all my commands from that one RUN statement.

Okay. JQ. I have everything.

It's sort of these. And that's it. single layer.

RUN. . And install all the dependencies.

Let's clean up. And build again. Okay.

So in. This case we managed to reduce the size down to 2 62 MB from 2 66 MB. Because we got rid of the additional layers.

Of course. This is not a lot compared to what multi-stage builds did for me, but in bigger and more complex images, this can still make a difference of about 50 MB as well. But this is a low-hanging fruit, So it's a best practice to use fewer layers in your Docker file.

So use fewer RUN statements and even fewer COPY statements. Okay. So that was about layers.

The third technique for size reduction. It's to create Docker images from scratch. This is the single most powerful, but also the most challenging way to create a Docker image.

Creating an image from scratch means that you use the scratch base image. No underlying operating system. No dependencies.

No preexisting data or applications. Think of it like an empty storage disc, you have to populate it with data. Because there is nothing in it.

Uh, scratch image by itself is very small in size. I think even less than one MB. And that one MB is just due to some metadata.

So, whatever you put in it is what contributes to the size. But this does mean that if you need any dependencies or supporting applications or tools, you will need to install them on the image, your self. So scratch images are very useful for two scenarios.

One. When you're creating your own base image, For example. You created your own Linux distribution, right?

You don't want to put this on top of another base image, like ubuntu or something. You can use a scratch image instead, and then put your own Linux distribution on top of it. Number two, when you have a standalone executable application, For example, you compiled your Golang application or C++ application or C# or Rust or whatever, But it compiles into final executable, right?

Just put this executable inside the scratch image. If the app has any dependencies, like a configuration file or a runtime library or other utilities. You need to add those too.

In this example, I'm using scratch image. As the base image for the final stage of my multi-stage build. This is by the way, the same example as I showed you earlier.

But the only difference is instead of using Alpine as my final stages based image, I'm just using scratch. So that's been it. prune again.

Remove everything. Server. Sorry.

So. Scratch. This before running.

I just want to check. There is nothing on my system. Nope.

Okay. Then let's build. This image is even smaller than the alpine image we created in the multistage example That one was about 16 MB.

With scratch, this one is down to just about 7 MB and this 7 MB, most of it is just the size of my golang final executable file HTTP server still works Now the application itself runs perfectly but I want to gain shell access inside this container I cannot do that because this container is literally baked on top of scratch all it has is my own application it doesn't have a shell program like bash so if I wanted the possibility of having a shell inside this container, I would need to install bash on it and any dependencies that bash relies on and that's how scratch works Its a clean slate image which has only the things that you add on it No. Apart from these three techniques, there are a few more simple things you can do, which will make a huge impact. On your docker image size.

Don't keep any of your applications, data inside your. Image this will directly add to the images size. instead, connect a container.

To an external storage volume and store your data over there so that it's still accessible by the application, but also doesn't bloat your image. Alternatively your application could also connect to an external data store like MySQL or AWS S3 and access the data from there. Also if you aren't already doing so.

Make use of the . dockerignore file. Docker ignore concept is exactly like .

gitignore. It lets you exclude specific files and folders from your final image. For example, you could add node _modules to your .

dockerignore file. Image compression tools like dive and Docker, slim are also very powerful. They will let you analyze your image, layers, to figure out where is the dead weight and what you could remove.

And lastly, if you're feeling very adventurous, you could ditch containers completely and explore. Unikernels. Unikernels.

are much smaller images, which come packed with your application and the underlying operating system designed to run directly. On a hypervisor.