How did three software engineers alone Power One of the fastest rising apps of all time in an area where billions flood startups to hire armies of software engineers. had it? Only three achieve this incredible feat?
In today's video, we'll talk about how Instagram was able to scale to 14 million users with only three software engineers. Welcome back to the channel. If you're new here, then my name is Lewis and I'm on a mission to inspire developers and tech enthusiasts.
After raising $500,000 to develop an app called Bourbon. Instagram decided that their app was to similar to Foursquare and decided to reinvent itself with a focus on photo sharing. So they stripped away a ton of its original features.
Eight weeks later, they gave it to friends to beta test and fix any bugs after. It was almost time to launch Instagram was released to the iOS App Store on October six, 2010, and only within hours the app reached over 25,000 users, then 100,000 after the first week and then over 1,000,003 months later. The timing behind Instagram was perfect.
It was when the iPhone four was just released to the public, and the iPhone four had a camera that was showing the power behind a cell phone camera. The explosion in growth was incredible. Growing to 1 million users and months and it's a good problem to have.
But that $500,000 was slowly being eaten up. So another $25 million was raised by the founders to help expand their team. world is looking at Instagram and to address the rapidly growing users.
They hired more people, but not a lot of people. A small group of software engineers were hired to help with the development, but their first job put out the constant fires. So with three engineers tasked to scale this application to 14 million users effectively, they needed to put down some important principles.
Keep it very simple. Don't reinvent the wheel. Go with proven and solid technology.
And after months and months of work, this is what they put together. Developers are always in critical danger if vulnerabilities in their app are missed. And thanks again to our sponsor Sneak.
Keeping your source code secure and up to date is extremely easy. Eyes like GitHub copilot and chatty beta are helping programmers code a lot faster, but over 53% of programmers say that A. I.
has introduced vulnerabilities into their code. So how do we prevent this from happening? Thanks to sneak code that is powered by deep code A.
I. . You can detect vulnerabilities in real time through an integration of your favorite code editor and fix right after you paste in the code.
This includes the packages that you install as well as sneak intelligently, analyzes your configuration files, dependencies, and your Docker containers. So when I generate some code to write a user to the database, Nic is able to tell me right away that something isn't adding up. This way you can get the benefits of A.
I. generating code for you while also keeping your code up to date with security standards with sneaks generous free forever plan. You can feel more confident writing code that's written by you or your A.
I. buddy. I've been using sneak personally for a very long time, and it's an absolute privilege for them to be a long term sponsor of this channel.
Sign up for Sneak for absolutely free with the link in the description below and start actually having confidence in your deployments. Load balancer. now, as soon as you open the application, Instagram pings their servers in order to bring you relevant information.
But where does it know to go to get that relevant information? This is where load balancers come in. Instagram engineers use Amazon's elastic load balancer to do this.
Load balancing is a common concept in computer infrastructure. If you had one server to handle all of this, you would have to upgrade to more powerful machines constantly. Instead, if you horizontally scale, it means you can take different sizes of computers and distribute the traffic accordingly.
So the load balancer decides where the traffic goes. In this instance. The engineers use three engine X instances that can be swapped in and out at any point.
the amount of traffic coming in at the spot is massive and these servers then had to decide what server to distribute it to those servers being the application server. Think about the main code that the Instagram backend sets on. It's on one of these servers.
Oh yeah. So it's not just one of these servers. It's 25 of them.
Each of these machines are designed to handle high CPU usage and on each of them have the Django framework installed on them. Django is an open source web framework that is built on top of Python that lets you migrate databases, set specific routes for your information and deal with authentication. Django is extremely popular in the Python community and so many big websites are still using it to this day.
So once the load balancer directs the traffic to one of these 25 servers, that traffic hits what is called G unicorn gun, a corn dog unicorn. Does anyone actually know what the answer to this is? This is a W SGI server which acts like a bridge between the actual Python code and the load balancer.
In this instance. But this presents another issue. How do we make sure that all 25 machines have the exact same code on them?
And how do we make sure we can control all 25? The engineers use something called fabric that lets them execute the same command across all machines. These machines are stateless, meaning that it doesn't store any information on that machine specifically.
So your request is treated the same amongst all machines. But if they're not being stored on the machine, then where are they being stored? Data storage.
one of the most important. If not the most important part of any software company is its data. base data like user photo metadata tags and more.
Live inside of a PostgreSQL database. But it can just be one database. It was 12 extra large memory databases, each with a replica helping it.
But what database do we even use then? They needed a quick way to quickly match metadata with a correct database to store data. For this issue, they needed a system that could easily store and remove data at lightning speeds.
So for this, they used Redis Redis as a data structure store that uses their memory to read and write extremely fast. Writing to a desk takes longer than writing to memory. And so for operations that require you to execute things like lightning fast, like a web server, it's extremely needed.
The Reds cache was stored in the media ID that is with the user ID so that the database knew which database to go to. this, knocking off even more potential latency. But now it's on to the database.
Here's an issue, though. Databases, especially PostgreSQL, has a really, really heavy architecture when it comes to handling connections. Every time a new connection starts, it creates a new process, performs the operation, and then closes this connection, which is great for long live connections.
But what happens when your users are coming in at the billions and very quickly to solve this? Something goes in between the database and the server. A connection pooler rather than making a new connection on the spot.
The pooler creates a set amount of existing connections. Once something connects the pool or gives the existing connection to the request coming in and puts it back when it's finished and when the database comes back with the data, it's then put into a cache powered by six instances of mem cached. Imagine going on Instagram and clicking the profile of someone making a call to a database over the same information is extremely inefficient and causes things to slow down.
So instead we can store frequently made requests in a cache, and when someone requests it, we can take it to our extremely fast memory store and try to keep all this pressure off of the database is such a core and pivotal aspect to any computer infrastructure. So this is where we're storing tags. But like this is Instagram here.
What about the images? Object Storage. One of the most convenient ways to store files services like Amazon S3 let you upload terabytes worth of objects in a file structure of your choosing.
rather than just sending the object directly through the servers, it does something different. When the request hits the application servers, it goes to S3 and generates a signed URL that is only temporary. It then sends this back to your phone so that the file can upload directly to the cloud servers instead of Instagram's.
The metadata info and the your realm of where this image is located is then saved in the database. So once all the queries are made and the data is ready to be sent back to the user, we have to find a way to send it to them before we send it back to the user. We have to swap the URL from the object storage to something called a CD and a content delivery network is exactly what it sounds.
there are. Servers all around the world are able to deliver files closer to your users, similar to our mem cache, when a user requests a brand new file from the CD and it takes a little longer to grab this file from the origin point on just the first time. this is then cached and this CD and edge point and can consistently be delivered to people in the closest area.
So the object storage link is then translated over to the CD end link and then sent back to the user. This is when your feed starts to populate the data renders on the screen and then the photo starts A download to your device and all of that just for you to exit Instagram immediately. Wait.
If my app was closed, how did this send to me? The Apple Push Notification Service, a service provided by Apple to let companies send payloads to them that will then be sent to the users. But like with over 90 likes happening per second, how do they do that scale?
another server that has the open source software Pie API and as dedicated for the task of sending out push notifications as fast as possible. So as soon as a like happens, a trigger is made to make a push notification in using the user ID, but like how can we scale to 90 likes per second task cuz a common software infrastructure practice to set and forget large processes. And in this case the three engineers used Dearman to do this large task like handling photo uploads after has been uploaded sharing to Facebook or Twitter after you upload or even sending push notifications are put in a queue.
Then over 200 workers would work at this queue, constantly filling tasks one by one. sharing to Facebook, could be 10 seconds after you upload and it doesn't really bother you. So being able to send certain jobs off to do later is a huge component.
And what makes applications run as fast as possible. so when you have 100 instances happening at the same time doing different things, how do you even monitor this, especially if you're three engineers? Monitoring these systems is one of the most important things you can do.
So the Instagram team use immune to see graphs across all of the things on their system. This also allowed them to set points in their graph to send alerts when things got too high. And as open source nature makes it really easy to add your own plugins to help with your unique situations like photos per second signups per minute, and more.
and tools like Sentry, which lives inside of your applications, makes it easy for developers to be able to track bugs Sentry will send you the stack trace as well as the bug before the user even reports that. and at this point, this monitoring tools is probably where these three engineers would find out what they had to work on next. So when we look back at the core principles, we can see they kept it very simple.
Yeah, I know. Doing all of this is probably not simple for one simple task. In theory, never once would you question the role of a specific area in the infrastructure.
And because of that, I think they kept it very simple. Don't reinvent the wheel. something you see and big tech is software's like this as only geared towards their company to suit their specific needs.
Of course this makes sense when you handle billions of users, the open source methods might not 100% accommodate to your needs. what I love is that all the software that they use, you can go on to GitHub and download right now so that you can take advantage of it. So in a way it's almost very inspiring as a developer with proven and solid technologies when you can software that Instagram used have all passed the test of time.
even the proprietary software is like the AWB load balancers are still being used. Engine X Rattus, G Unicorn. anyone have the answer to that?
By the way, Dearman and more are still being used. And even back then, it wasn't necessarily a risk to use those softwares either. what I love about seeing this infrastructure is that if you map out in the way that we did, you can almost see at what point they decided to implement what software and server, at what point.
when you're a three person team, you just have to do what you have to do. This article was posted 12 years ago, so make sure you check it out in the link in the description below. you love to see deep dives of how your favorite software works behind the scenes, then make sure you check out my Discord video on how they scale to like trillions of messages.
This insane.