When 10,000 Software Engineers Work on the Same Code

199.62k views2282 WordsCopy TextShare

Coding with Lewis

Try the Stacking workflow: https://gt.dev/codingwithlewis How do software engineers at big tech com...

Video Transcript:

10, 000 and more software engineers updating a repository of over 2 billion lines of code. How does big tech make big and small changes to their software? You just begin your job as a junior developer at one of the largest tech companies in the world.

Your first job outside fixing bugs, create a video player with live reactions on the time code. Now is your time to prove yourself. You can prove that you were meant for this job.

From junior to senior in 2 billion lines of code. A monolithic repository. A common and recently popular way to store your code in a single repository, even if multiple projects are inside of it.

A single source of truth, reusing and resharing code across applications. Collaboration across different teams is the advantages of this type of style. The disadvantages?

Well. . .

Two billion lines of code gets a version control system that's basically used Universally, but what happens when you push it to its absolute limits like 1 billion files terabytes worth of content Solutions like a virtual file system forget or Google's proprietary Piper less developers clone Only the files that are relevant to what they're working on It shows you the entire repository without it having downloaded and more efficiencies that will make a 2 billion lines of code repository feel like a 10, 000 line of code repository. So once you clone the repository, everything is a placeholder. Okay, embed video player in standard webpage.

Okay, that shouldn't be too hard. So we have the front end components we have to build as well as some API endpoints with some database migrations we have to do. Okay, let's open up the code editor and get started.

Let's see, alright, videopage. tsx is probably where I should put this. Once the file is loaded, it will understand what files it depends on and then start downloading those as well.

That way, it's just the code that you're using that needs to be on your local machine. Everything is hosted on a gigantic git repository in the cloud that you can then push at a later time. Whew, okay, that wasn't too hard, but I mean, it's pretty simple.

Just embed the video player. I think we can do much better. And you know what?

Let's add a cool little feature here. All right, let's submit a pull request. Oh, what?

Um, okay, whatever. Just, we'll submit anyway. What on earth happened here now?

Get works. You branch off of the main branch to make changes to your code, and then merge that code in at a later time. The longer you work on a branch separate from main, the bigger your feature grows and the more updated it becomes.

Eventually, you'll need to rebase it onto the updated main to continue. And most workflows have this idea that one branch is. .

. Equal to one pull request. But what happens when you're working on a large feature, but require the main branch with your requested changes in order to work on the next feature?

Like, for example, the video player feature being finished and then deciding to work on that timecode feature that relies. on that video player. And this is where stacking comes into play.

Stacking lets you make little changes instead of single giant ones. While waiting for your video feature review, create a new branch on the existing one for parallel timecode reactions. These interdependent branches form a stack, which is compact and easily rebased onto main.

So you can develop the timecode reactions once that's finished. Depending on your feature's complexity, you can add small branches to your stack. Even while awaiting previous branch reviews.

The advantages of this is that you don't rely on the main branch being the only source of truth in your development process. You also don't have to wait for long periods of time for something to be added into the main branch and you can create small pieces of code that can easily be reviewed. So if our stack is five branches tall and the first branch is ready to be merged into the main branch.

It would merge that branch and then rebase the branches that it would depend on. By this point, you could work on completely different things and then this allows the thousands of developers to continue working on making the software better without waiting for automated processes. And large companies like Google have implemented this with internal tools that they specifically developed, called critiques.

or Facebook with Fabricator. Tools made by engineers to improve developer productivity. And modern Git makes it hard to implement stacking on its own because it wasn't built to handle the dependencies and rebasing required to maintain a stacking workflow.

So modern tooling based companies like Graphite or open source projects like GHStack help developers automate this process. Think about it, when you have over 10, 000 engineers submitting code, having even 10 percent of them blocked because of code review can cost millions of dollars. And thanks to today's sponsor, Graphite, doing code reviews is much easier and simpler, so you can constantly ship at a much faster rate.

Large tech companies like Facebook or Google have built internal code review tools that allow them to change their code base rapidly, but others are struggling to find a tool that supports the same workflow. And Graphite enables this workflow entirely. With the overarching goal of increasing developer velocity, the CLI and dashboard let engineers take advantage of the stacking workflow while syncing all of your data back to GitHub.

So, while you technically can stack with traditional Git, It's extremely tedious and impossible to manage dependencies manually at scale. So the Graphite CLI makes stacking easy by abstracting all the dependency management away from the user and allowing you to focus on just committing your code quickly. The Graphite CLI makes this process easy by letting you work with get while simplifying trunk based development and manage your dependencies that enables stacking.

And just like in your Git workflow, you run a simple command to create a stack, navigate your stack, submitting your stack, and more. The Graphite dashboard makes it easy to stay in the loop by alerting you what pull requests need your actions, monitoring status, and giving a visual way to group and navigate between different PRs in a stack. So whether you're an engineer in a big team or just developing solo, Graphite's developer productivity tools and stacking workflow will help you ship code faster.

If you're a junior dev looking to learn how big companies work, give Graphite a try. Graphite's really generous free tier gives you the ability to use stacking to help improve your workflow to see if it's just right for you. Some of the fastest moving tech companies are using Graphite to bring features to users much faster.

Thanks again to Graphite for sponsoring today's video. Click the link in the description to ship your code faster and help support the channel. When you merge code in, what happens when the main source code changed multiple times?

Merge conflicts arise. And this means that the person reviewing the code has to choose which code stays and which code goes. Like trying to fit multiple puzzle pieces into one slot.

This leads to slower release times, decreased developer productivity, and potential bugs from the wrong pieces being chosen. But we have 1600 pull requests to go through. How can this be done efficiently?

One methodology is called trunk based development. Similar to stacking from earlier, you create small changes in your code and submit a pull request. In an extremely fast paced environment with thousands of software engineers, Reviewing a small amount of code is easier than large amounts of code.

It's also less likely to conflict with other code that is being submitted. The trunk refers to the main branch in this instance, where you stay as close as possible to it. Once everything merges in, it's easier to rebase changes to other branches and pull requests.

And for big features. Other methodologies like feature flagging lets you work on a feature really fast while not having it enabled at all in production. This makes it easy for you to compare code with pull requests that are only a day old and can get inserted into the source code as fast as possible.

But of course, this can be automated. Even better. Merge queues.

A queue of pull requests lined up by submitted or priority and automatically tested on the main branch. If it passes all checks, it is merged. If it fails, then usually sent back to the developer with how it failed.

Some developers implement the merge queue in different ways, batching multiple pull requests together, doing one by one. Setting priority, et cetera. So with stacking and trunk, thousands of developers are able to submit code daily that billions of users rely on on a daily basis.

Awesome. Your PR was approved and merged into the main branch. How does the code that you just wrote, as well as the code that 10, 000 other engineers just wrote.

Get to your computer. The code that developers write is processed to be optimized for the machine to run it. Two billion lines of code.

How do you even get started on that? And how long would it take when you have to build this? Almost daily.

This is where continuous integration comes in. Continuous integration often refers to the build stage of the software release cycle. This stage is often defined differently per company depending on what their infrastructure, their programming language, and tech stack look like.

Automated tests in the pull requests run to see if it can be merged in, but other tests can be done when it is merged in as well to see if it can work on a production server. Build tools like Bazel automate the process of accepting new code into a repository. Bazel by Google was designed for this very issue.

Extremely large code bases with billions of lines of code with It's optimized for large codebases by caching previous build results, so it won't rebuild the entire project if just one tiny part is modified. It utilizes all cores of a computer to distribute the build process while also distributing to many different machines. And this whole process makes it so that when you ask for a review.

It builds, it tests, it accepts, and runs even more tests when it's integrated into the production code. With codebases this large, multiple supercomputers are working day and night testing, building, and integrating different versions of its codebase. When supercomputers cost salaries per hour to run, A 5 percent optimization is a multi million dollar success.

But after tests, builds, and integrations are successful, how do we deliver it? Continuous delivery. One wrong move can mean millions of dollars lost in downtime.

We can't just click update and cross our fingers. This calls for continuous delivery. 10 minutes of downtime could result in millions of dollars being lost instantly.

So, we need to make sure that this doesn't ever happen. Continuous delivery comes into play. The ideology we should have is that our infrastructure that powers our application should be disposable and reproducible.

Need 10 new servers? No problem. With Terraform, we just describe what we want in code.

Then, server's delivered, pre configured to our exact specifications. Then you have site reliability engineers who use monitoring tools to predict when catastrophic events may occur in the future. When the code gets to its destination, we release changes bit by bit.

Canary deployments first, then onto blue green once things check out. If issues arise, we halt and roll back immediately. This makes it as risk free as possible.

Then all new changes are released and our users start to see the new features or. bug fixes. For them, it was just a nice little upgrade.

For us, this was a multi million dollar operation that took thousands of engineers hours and hours of trial and error. Congratulations, your users have tried out your feature and already have something to complain about. But you did it.

You deployed a feature that hundreds of millions of users are going to be using daily. Before you can celebrate, you get another feature request and a bug fix. So you open up your code editor and get right back to it.

Whenever you find a silly little bug, oftentimes you think, why can't you just quickly fix this small little bug? When I was working on small projects, sometimes it was as easy as just deploying it within the next hour. But it's easy to understand why things take so long to get into production.

As soon as you open up the code editor, it has to stop in a dozen locations before it's even deemed eligible to go into production. So, large tech hires thousands of software engineers to help fix these problems. But then more problems arise when you have thousands of software engineers, which creates a brand new type of market specifically for software to help developers scale their software like graphite.

If you like this video, make sure you check out the video where I show how discord stores trillions of messages. Pretty insane. And every software has a different solution.

In this video I took a lot of inspiration from how Google structures their massive code base as well as how Microsoft handles the version control of the Windows operating system.