If you want to level up from a junior developer to a senior engineer or land a high paying job at a big tech company, you need to learn system design.  But where do you start?  To master system design, you first need to understand the core concepts and fundamental building blocks that come up when designing real world systems or tackling system design interview questions. 
In this video, I will break down the 30 most important system design concepts you need to know.  Learning these concepts helped me land high paying offers from multiple big tech companies.  And in my 8 years as a software engineer, I've seen them used repeatedly when building and scaling large scale systems. 
Let's get started.  Almost every web application that you use is built on this simple yet powerful concept called client server architecture.  Here is how it works. 
On one side, you have a client.  This could be a web browser, a mobile app or any other frontend application.  And on the other side, you have a server, a machine that runs continuously waiting to handle incoming request. 
The client sends a request to store, retrieve or modify data.  The server receives the request, processes it, performs the necessary operations, and sends back a response.  This sounds simple, right? 
But there is a big question.  How does the client even know where to find a server?  A client doesn't magically know where a server is. 
It needs an address to locate and communicate with it.  On the internet, computers identify each other using IP addresses, which works like phone numbers for servers.  Every publicly deployed server has a unique IP address. 
Something like this.  When a client wants to interact with a service, it must send request to the correct IP address.  But there's a problem. 
When we visit a website, we don't type its IP address.  We just enter the website name.  Right? 
Instead of relying on hard to remember IP addresses, we use something much more human friendly, domain names.  But we need a way to map a domain name to its corresponding IP address.  This is where DNS or domain name system comes in. 
It maps easy to remember domain names like algo masteraster. io io to their corresponding IP addresses.  When you type algo masteraster.
io into your browser, your computer asks a DNS server for the corresponding IP address.  Once the DNS server responds with the IP, your browser uses it to establish a connection with the server and make a request.  You can find the IP address of any domain name using the ping command. 
When you visit a website, your request doesn't always go directly to the server.  Sometimes it passes through a proxy or reverse proxy first.  A proxy server acts as a middleman between your device and the internet. 
When you request a web page, the proxy forwards your request to the target server, retrieves the response, and sends it back to you.  A proxy server hides your IP address, keeping your location and identity private.  A reverse proxy works the other way around. 
It intercepts the client request and forwards them to the back end server based on predefined rules.  Whenever a client communicates with a server, there is always some delay.  One of the biggest cause of this delay is physical distance. 
For example, if our server is in New York, but a user in India sends a request, the data has to travel halfway across the world and then the response has to make the same long trip back.  This roundtrip delay is called latency.  High latency can make applications feel slow and unresponsive. 
One way to reduce latency is by deploying our service across multiple data centers worldwide.  This way, users can connect to the nearest server instead of waiting for data to travel across the globe.  Once a connection is made, how do clients and servers actually communicate? 
Every time you visit a website, your browser and the server communicate using a set of rules called HTTP.  That's why most URLs is start with HTTP or it secure version HTTPS.  The client sends a request to the server. 
This request includes a header containing details like the request type, browser type, and cookies and sometimes a request body which carries additional data like form inputs.  The server processes the request and responds with an HTTP response either returning the requested data or an error message if something goes wrong.  HTTP has a major security flaw. 
It sends data in plain text.  Modern websites use HTTPS.  HTTPS encrypts all data using SSL or TLS protocol ensuring that even if someone intercepts the request, they can't read or alter it. 
But clients and servers don't directly exchange raw HTTP requests and response.  HTTP is just a protocol for transferring data, but it doesn't define how requests should be structured, what format responses should be in, or how different clients should interact with the server.  This is where APIs or application programming interfaces come in. 
Think of an API as a middleman that allows clients to communicate with servers without worrying about low-level details.  A client sends a request to an API.  The API hosted on a server processes the request, interacts with databases or other services, and prepares a response. 
The API sends back the response in a structured format, usually JSON or XML, which the client understands and can display.  There are different API styles to serve different needs.  Two of the most popular ones are REST and GraphQL. 
Just a quick note to keep this video concise, I'm covering these topics at a high level, but if you want to go deeper and learn these topics in more detail, check out my blog at blog. algammaster. io. 
Every week I publish in-depth articles on complex system design topics with clear explanations and real world examples.  Make sure to subscribe so that you don't miss my new articles.  Among the different API styles, REST is the most widely used. 
A REST API follows a set of rules that defines how clients and servers communicate over HTTP in a structured way.  REST is stateless.  Every request is independent. 
Everything is created as a resource.  For example, users, orders, products.  It uses standard HTTP methods like get to retrieve data, post to create new data, put to update existing data, and delete to remove data. 
Rest APIs are great because they are simple, scalable, and easy to cast, but they have limitations, especially when dealing with complex data retrieval.  REST endpoints often return more data than needed, leading to inefficient network uses.  To address these challenges, GraphQL was introduced in 2015 by Facebook. 
Unlike REST, GraphQL lets client ask for exactly what they need.  Nothing more, nothing less.  With a REST API, if you need a user's profile along with their recent post, you might have to make multiple requests to different endpoints. 
With GraphQL, you can combine those requests into one and fetch exactly the data you need in a single query.  The server responds with only the requested fields.  However, GraphQL also comes with trade-offs. 
It requires more processing on the server side, and it isn't as easy to cast as REST.  Now when a client makes a request they usually want to store or retrieve data.  But this brings up another question where is the actual data stored. 
If our application deals with small amounts of data we could store it as a variable or as a file and load it in memory.  But modern applications handle massive volumes of data far more than what memory can efficiently handle.  That's why we need a dedicated server for storing and managing data. 
A database.  A database is the backbone of any modern application.  It ensures that data is stored, retrieved and managed efficiently while keeping it secure, consistent and durable. 
When a client request to store or retrieve data, the server communicates with the database, fetches the required information and returns it to the client.  But not all databases are the same.  In system design, we typically choose between SQL and NoSQL databases. 
SQL databases store data in tables with a strict predefined schema and they follow ACIT properties.  Because of these guarantees, SQL databases are ideal for applications that require strong consistency and structured relationships such as banking systems.  NoSQL databases on the other hand are designed for high scalability and performance. 
They don't require a fixed schema and use different data models including key value stores, document stores, graph databases, and wide column stores which are optimized for large scale distributed data.  So, which one should you use?  If you need structured relational data with a strong consistency, SQL is the better choice. 
If you need high scalability, flexible schema, NoSQL is the better choice.  Many modern applications use both SQL and NoSQL together.  As our user base grows, so does the number of requests hitting our application servers. 
One of the quickest solutions is to upgrade the existing server by adding more CPU, RAM or storage.  This approach is called vertical scaling or scaling up which makes a single machine more powerful.  But there are some major limitations with this approach. 
You can't keep upgrading a server forever.  Every machine has a maximum capacity.  More powerful servers become exponentially more expensive. 
If this one server crashes, the entire system goes down.  So, while vertical scaling is a quick fix, it's not a long-term solution for handling high traffic and ensuring system reliability.  Let's look at a better approach, one that makes our system more scalable and fall tolerant. 
Instead of upgrading a single server, what if we add more servers to share the load?  This approach is called horizontal scaling or scaling out where we distribute the workload across multiple machines.  More servers is equal to more capacity which means the system can handle increasing traffic more effectively. 
If one server goes down, others can take over which improves reliability.  But horizontal scaling introduces a new challenge.  How do clients know which server to connect to? 
This is where a load balancer comes in.  A load balancer sits between clients and backend servers acting as a traffic manager that distributes request across multiple servers.  If one server crashes, the load balancer automatically redirects traffic to another healthy server. 
But how does a load balancer decide which server should handle the next request?  It is a load balancing algorithm such as roundroin, least connections and IP hashing.  So far we have talked about scaling our application servers. 
But as traffic grows, the volume of data also increases.  At first we can scale a database vertically by adding more CPU, RAM and storage similar to application servers.  But there is a limit of how much a single machine can handle. 
So let's explore other database scaling techniques that can help manage large volumes of data efficiently.  One of the quickest and most effective ways to speed up database read queries is indexing.  Think of it like the index page at the back of a book. 
Instead of flipping through every page, you jump directly to the relevant section.  A database index works the same way.  It's a super efficient lookup table that helps the database quickly locate the required data without scanning the entire table. 
An index is stores column values along with pointers to actual data rows in the table.  Indexes are typically created on columns that are frequently queried such as primary keys, foreign keys, and columns frequently used in rare conditions.  While indexes speed up reads, they slow down rights since the index needs to be updated whenever data changes. 
That's why we should only index the most frequently accessed columns.  Indexing can significantly improve read performance.  But what if even indexing isn't enough and our single database server can't handle the growing number of read request? 
That's where our next database scaling technique, replication, comes in.  Just like we added more application servers to handle increasing traffic, we can scale our database by creating copies of it across multiple servers.  Here is how it works. 
We have one primary database also called the primary replica that handles all write operations.  We have multiple read replicas that handle read queries.  Whenever data is written to primary database, it gets copied to read replicas so that they stay in sync. 
Replication improves the read performance since read requests are spread across multiple replicas reducing the load on each one.  This also improves availability since if the primary replica fails, a read replica can take over as the new primary.  Replication is great for scaling read heavy applications. 
But what do we need to scale right operations or store huge amounts of data?  Let's say our service became popular.  It now has millions of users and our database has grown to terabytes of data. 
A single database server will eventually struggle to handle all this data efficiently.  Instead of keeping everything in one place, we split the database into smaller, more manageable pieces and distribute them across multiple servers.  This technique is called sarding. 
Here is how it works.  We divide the database into smaller parts called Sards.  Each sard contains a subset of the total data. 
Data is distributed based on the sarding key.  For example, user ID.  By distributing data this way, we reduce database load since each SA handles only a portion of queries and speed of read and write performance since queries are distributed across multiple SS instead of hitting a single database. 
Singing is also referred to as horizontal partitioning since it splits data by rows.  But what if the issue isn't the number of rows but rather the number of columns?  In such cases, we use vertical partitioning where we split the database by columns. 
Imagine we have a user table that stores profile details, login history, and billing information.  As this table grows, queries become slower because the table must scan many columns even when a request only needs a few specific fields.  To optimize this, we use vertical partitioning where we split user table into smaller, more focused tables based on users patterns. 
This improves query performance since each request only scans relevant columns instead of the entire table.  It also reduces unnecessary disk IO making data retrieval quicker.  However, no matter how much we optimize the database, retrieving data from disk is always slower than retrieving it from memory. 
What if we could store frequently access data in memory?  This is called caching.  Caching is used to optimize the performance of a system by storing frequently access data in memory instead of repeatedly fetching it from database. 
One of the most common caching strategies is the cash aside pattern.  Here is how it works.  When a user requests the data, the application first checks the C. 
If the data is in the cache, it's returned instantly avoiding a database call.  If the data is not in the C, the application retrieves it from the database.  It stores it in the C for future request and returns it to the user. 
Next time the same data is requested, it's served directly from C, making the request much faster.  To prevent outdated data from being served, we use time to live value or TTL.  Let's look at next database scaling technique. 
Most relational database use normalization to store data efficiently by breaking it into separate tables.  While this reduces redundancy, it also introduces joins.  When retrieving data from multiple tables, the data must combine them using join operations, which can slow down queries as the data set grows. 
Denormalization reduces the number of joints by combining related data into a single table, even if it means some data gets duplicated.  For example, instead of keeping users and orders in a separate table, we create user orders table that stores user details along with the latest orders.  Now, when retrieving a user's order history, we don't need a join operation. 
The data is already stored together leading to faster queries and better read performance.  Denormalization is often used in read heavy applications where speed is more critical.  But the downside is it leads to increased storage and more complex update request. 
As we scale our system across multiple servers, databases and data centers, we enter the world of distributed systems.  One of the fundamental principles of distributed systems is the cap theorem which states that no distributed system can achieve all three of the following at the same time.  Consistency, availability, and partition tolerance. 
Since network failures are inevitable, we must choose between consistency plus partition tolerance or availability plus partition tolerance.  If you want to learn about cap theorem in more detail, you can check out this article on my blog called Cap theorem explained.  Most modern applications don't just store text record. 
They also need to handle images, videos, PDFs, and other large files.  Traditional databases are not designed to store large unstructured files efficiently.  So, what's the solution? 
We use blob storage like Amazon S3.  Blobs are like individual files like images, videos, or documents.  These blobs are stored inside logical containers or buckets in the cloud. 
Each file gets a unique URL making it easy to retrieve and serve over the web.  There are several advantages with using blob storage like scalability, pay as you go model, automatic replication, easy access.  A common use case is to stream audio or video files to user applications in real time. 
But streaming the video file directly from blob storage can be slow, especially if the data is stored in a distant location.  For example, imagine you are in India trying to watch a YouTube video that's hosted on a server in California.  Since the video data has to travel across the world, this could lead to buffering and slow load times. 
A content delivery network or CDN solves this problem by delivering content faster to users based on their location.  A CDN is a global network of distributed servers that work together to deliver web content like HTML pages, JavaScript files, images, and videos to users based on their geographic location.  Since content is served from the closest CDN server, users experience faster load times with minimal buffering. 
Let's move to the next system design concept which can help us build realtime applications.  Most web applications use HTTP which follows a request response model.  The client sends a request. 
The server processes the request and sends a response.  If the client needs new data, it must send another request.  This works fine for static web pages but it's too slow and inefficient for real-time applications like live chat applications, stock market dashboards or online multiplayer games. 
With HTTP, the only way to get real-time update is through frequent polling, sending repeated request every few seconds.  But polling is inefficient because it increases the server load and waste bandwidth.  As most responses are empty when there is no new data, webockets solve this problem by allowing continuous two-way communication between the client and the server over a single persistent connection. 
The client initiates a websocket connection with the server.  Once established, the connection remains open.  The server can push updates to the client at any time without waiting for a request. 
The client can also send messages instantly to the server.  This enables real-time interactions and eliminates the need for polling.  Webockets enables real-time communication between a client and a server. 
But what if a server needs to notify another server when an event occurs?  For example, when a user makes a payment, the payment gateway needs to notify your application instantly.  Instead of constantly pulling an API to check if an event has occurred, web hooks allow a server to send an HTTP request to another server as soon as the event occurs. 
Here is how it works.  The receiver, for example, your app registers a web hook URL with the provider.  When an event occurs, the provider sends a HTTP post request to the web hook URL with event details. 
This saves server resources and reduces unnecessary API calls.  Traditionally applications were built using a monolithic architecture where all features are inside one large codebase.  This setup works fine for small applications but for large scale systems monoliths become hard to manage, scale and deploy. 
The solution is to break down your application into smaller independent services called microservices that work together.  Each microser handles a single responsibility has its own database and logic so it can scale independently.  communicates with other microservices using APIs or message cues. 
This way, services can be scared and deployed individually without affecting the entire system.  However, when multiple microservices need to communicate, direct API calls aren't always efficient.  This is where message cues come in. 
Synchronous communication, for example, waiting for immediate responses doesn't scale well.  A message Q enables services to communicate asynchronously, allowing requests to be processed without blocking other operations.  Here is how it works. 
There is a producer which places a message in the queue.  The queue temporarily host the message.  The consumer retrieves the message and processes it. 
Using message cues, we can decouple services and improve the scalability and we can prevent overload on internal services within our system.  But how do we prevent overload for the public APIs and services that we deploy?  For that we use rate limiting. 
Imagine a bot starts making thousands of requests per second to your website.  Without restrictions, this could crash your servers by consuming all available resources and degrade performance for legitimate users.  Rate limiting restricts the number of requests a client can send within a specific time frame. 
Every user or IP address is assigned a request kota, for example, 100 requests per minute.  If they exceed this limit, the server blocks additional requests temporarily and returns an error.  There are various rate limiting algorithms. 
Some of the popular ones are fix window, sliding window, and token bucket.  We don't need to implement our own rate limiting system.  This can be handled by something called an API gateway. 
An API gateway is a centralized service that handles authentication, rate limiting, logging, monitoring, request routing, and much more.  Imagine a microservices-based application with multiple services.  Instead of exposing each service directly, an API gateway acts as a single entry point for all client request. 
It routes the request to the appropriate microser and the response is sent back through the gateway to the client.  API gateway simplifies API management and improves the scalability and security.  In distributed systems, network failures and service retries are common. 
If a user accidentally refreshes a payment page, the system might receive two payment request instead of one.  Adam potency ensures that repeated request produced the same result as if the request was made only once.  Here is how it works. 
Each request is assigned a unique ID.  Before processing, the system checks if the request has already been handled.  If yes, it ignores the duplicate request. 
If no, it processes the request normally.  If you enjoyed this video, I think you will love my weekly newsletter where I dive deeper into system design concepts with real world examples.  I also share articles on system design interview questions and tips to help you prepare for interviews. 
You can subscribe it at blog. algamaster. io.