this complete system design tutorial covers scalability reliability data handling and high level architecture with clear explanations real world examples and practical strategies hike will teach you the Core Concepts you need to know for a system designs interview this is a complete crash course on system design interview Concepts that you need to know to as your job interview the system design interview doesn't have to do much with coding and people don't want to see you write actual code but how you glue an entire system together and that is exactly what we're going to cover in this
tutorial we'll go through all of the concepts that you need to know to as your job interview before designing large scale distributed systems it's important to understand the high level architecture of the individual computer let's see how different parts of the computer work together to execute our code computers function through a layered system each optimized for varying tasks at Decor computers understand only binary zeros and ones these are represented as bits one bit is the smallest data unit in Computing it can be either zero or one one bite consists of eight bits and it's used
to represent a single character like a or number like one expanding from here we have kilobyte megabyte gigabytes and terabytes to store this data we have computer disk storage which holds the primary data it can be either htd or SS D type the disk storage is nonvolatile it maintains data without power meaning if you turn off or restart the computer the data will still be there it contains the OS applications and all user files in terms of size discs typically range from hundreds of gigabytes to multiple terabytes while ssds are more expensive they offer significantly
faster data retrieval than HDD for instance an SSD may have a r speed of 500 MB per second to 3,500 while an HDD might offer 80 to 160 mb per second the next immediate access point after dis is the Ram or random access memory RAM serves as the primary active data holder and it holds data structures variables and applications data that are currently in use or being processed when a program runs its variables intermediate computations runtime stack and more are stored in Ram because it allows for a quick read and write access this is a
volatile memory which means that it requires power to retain its contents and after you restart the computer the data may not be persisted in terms of size Rams range from a few Gaby in consumer devices to hundreds of gabt in high-end servers their read right speed often surpasses 5,000 megabytes per second which is faster than even the fastest SS this dis speed but sometimes even this speed isn't enough which brings us to the cache the cache is smaller than Ram typically it's measured in megabytes but access times for cach memory are even faster than Ram
offering just a few Nan for the L1 cache the CPU first checks the L1 cach for the data if it's not found it checks the L2 and L3 cache and then finally it checks the ram the purpose of a cach is to reduce the average time to Access Data that's why we store frequently used data here to optimize CPU performance and what about the CPU CPU is the brain of the computer it fetches decodes and executes instructions when you run your code it's the CPU that processes the operations defined in that program but before it
can run our code which is written in high level languages like Java C++ python or other languages our code first needs to be compiled into machine code a compiler performs this translation and once the code is compiled into machine code the CPU can execute it it can read and write from our Ram disk and cach data and finally we have motherboard or main board which is what you might think of as the component that connects everything it provides the path phase that allow data to flow between these components now let's have a look at the
very high level architecture of a production ready up our first key area is the cicd pipeline continuous integration and continuous deployment this ensures that our code goes from the repository through a series of tests and pipeline checks and onto the production server without any manual intervention it's configured with platforms like Jenkins or GitHub actions for automating our deployment processes and once our app is in production it has to handle lots of user requests this is managed by our load balancers and reverse proxies like ngx they ensure that the user request are evenly distributed across multiple
servers maintaining a smooth user experience even during traffic specs our server is also going to need to store data for that we also have an external storage server that is not running on the same production server instead it's connected over a network our servers might also be communicating with other servers as well and we can have many such services not just one to ensure everything runs smoothly we have logging and monitoring system s keeping a Keen Eye on every micro interaction of storing logs and analyzing data it's standard practice to store logs on external Services
often outside of our primary production server for the back end tools like pm2 can be used for logging and monitoring on the front end platforms like Sentry can be used to capture and Report errors in real time and when things don't go as plann meaning our logging systems detect failing requests or anomalies first it enforce our alerting service after that push notifications are sent to keep users informed from generic something rank wrong to specific payment failed and modern practice is to integrate these alerts directly into platforms we commonly use like slack imagine a dedicated slack
Channel where alerts pop up at the moment an issue arises this allows developers to jump into action almost instantly addressing the root CS before it escalates and after that developers have to debug the issue first and foremost the issue needs to be identified those logs we spoke about earlier they are our first Port of Call developers go through them searching for patterns or anomalies that could point to the source of the problem after that it needs to be replicated in a safe environment the golden rule is to never debug directly in the production environment instead
developers recreate the issue in a staging or test environment this ensures users don't get affected by the debugging process then developers use tools to peer into the running app apption and start debugging once the bug is fixed a hot fix is rolled out this is a quick temporary fix designed to get things running again it's like a patch before a more permanent solution can be implemented in this section let's understand the pillars of system design and what it really takes to create a robust and resilent application now before we jump into the technicalities let's talk
about what actually makes a good design when we talk about good design in system architecture we are really focusing ing on a few key principles scalability which is our system growth with its user base maintainability which is ensuring future developers can understand and improve our system and efficiency which is making the best use of our resources but good design also means planning for failure and building a system that not only performs well when everything is running smoothly but also maintains its composure when things go wrong at the heart of system design are three key elements
moving data storing data and transforming data moving data is about ensuring that data can flow seamlessly from one part of our system to another whether it's user request seeding our servers or data transfers between databases we need to optimize for Speed and security storing data isn't just about choosing between SQL or nosql databases it's about understanding access patterns indexing strategies and backup Solutions we need to ensure that our data is not only stored securely but is also readily available when needed and data transformation is about taking row data and turning it into meaningful information whether
it's aggregating log files for analysis or converting user input into a different format now let's take a moment to understand the crucial Concept in system design the cap theorem also known as Brewers theorem named after computer scientist Eric Brewer this theorem is a set of principles that guide us in making informed tradeoffs between three key components of a distributed system consistency availability and partition tolerance consistency ensures that all nodes in the distributed system have the same data at the same time if you make a change to one node that change should also be reflected across
all nodes think of it like updating a Google doc if one person makes an edit everyone else sees that edit immediately availability means that the system is is always operational and responsive to requests regardless of what might be happening behind the scenes like a reliable online store no matter when you visit it's always open and ready to take your order and partition tolerance refers to the system's ability to continue functioning even when a network partition occur meaning if there is a disruption in communication between nodes the system still works it's like having a group chat
where even if one person loses connection the rest of the group can continue chatting and according to cap theorem a distributed system can only achieve two out of these three properties at the same time if you prioritize consistency and partition tolerance you might have to compromise on availability and vice versa for example a banking system needs to be consistent and partition tolerant to ensure Financial accuracy even if it means some transactions take longer to process temporarily compromising availability so every design DEC decision comes with tradeoffs for example a system optimized for read operations might perform
poorly on write operations or in order to gain performance we might have to sacrifice a bit of complexity so it's not about finding the perfect solution it's about finding the best solution for our specific use case and that means making informed decision about where we can afford to compromise so one important measurement of system is availability this is the measure of systems operational performance and reliability when we talk about availability we are essentially asking is our system up and running when our users need it this is often measured in terms of percentage aiming for that
golden 5 9's availability let's say we are running a critical service with 99.9 availability that allows for around 8.76 hours of downtime per year but if we add two NES to it we are talking just about 5 minutes of downtime per year and that's a massive difference especially for services where every second counts we often measure it in terms of uptime and downtime and here is where service level objectives and service level agreements come into place slos are like setting goals for our systems performance and availability for example we might set an SLO stating that
our web service should respond to request within 300 milliseconds and 99.9% of the time slas on the other hand are like for formal contracts with our users or customers they Define the minimum level of service we are committing to provide so if our SLA guarantees 99.99 availability and we drop below that we might have to provide refunds or other compensations to our customers building resilence into our system means expecting the unexpected this could mean implementing redundant systems ensuring there is always a backup ready to take over in case of failure or it could mean designing
our system to degrade gracefully so even if certain features are unavailable the core functionality remains intact to measure this aspect we used reliability fault tolerance and redundancy reliability means ensuring that our system works correctly and consistently fa tolerance is about preparing for when things go wrong how does our system handle unexpected failures or attacks and redundancy is about having backups ensuring that if one part of our system fails there is another ready to take its place we also need to measure the speed of our system and for that we have throughput and latency throughput measures
how much data our system can handle over a certain period of time we have server throughput which is measured in requests per second this metric provides an indication of how many client requests a server can handle in a given time frame a higher RPS value typically indicates better performance and the ability to handle more concurrent users we have database throughput which is measured in queries per second this quantifies the number of queries a database can process in a second like server throughput a higher QPS value usually signifies better performance and we also have data throughput
which is measured in bytes per second this reflects the amount of data transferred over a network or processed by a system in a given period of time on the other hand latency measures how long it takes to handle a single request it's the time it takes for a request to get a response and optimizing for one can often lead to sacrifices in the other for example batching operations can increase throughput but might also increase latency and designing a system poly can lead to a lot of issues down the line from performance bottlenecks to security vulnerabilities
and unlike code which can be refactored easily redesigning A system can be a Monumental task that's why it's crucial to invest time and resources into getting the design right from the start and laying a solid foundation that can support the weight of future features and user growth now let's talk about networking Basics when we talk about networking Basics we are essentially discussing how computers communicate with each other at the heart of this communication is the IP address a unique identifier for each device on a network IP V4 addresses are 32bit which allows for approximately 4
billion unique addresses however with the increasing number of devices we are moving to IP V6 which uses 128bit addresses significantly increasing the number of available unique addresses when two computers communicate over a network they send and receive packets of data and each packet contains an IP header which contains essential information like the senders and receivers IP addresses ensuring that the data reaches the correct destination this process is governed by the Internet Protocol which is a set of rules that defines how data is sent and received besides the IP layer we also have the application layer
where data specific to the application protocol is stored the data in these packets is formatted according to specific application protocol data like HTTP for web browsing so that the data is interpreted correctly by the receiving device once we understand the basics of Ip addressing and data packets we can dive into transport layer where TCP and UDP come into play TCP operates at the transport layer and ensures reliable communication it's like a delivery guy who makes sure that your package not only arrives but also checks that nothing is missing so each data packet also includes a
TCP header which is carrying essential information like port numbers and control flux necessary for managing the connection and data flow TCP is known for its reliability it ensures the complete and correct delivery of data packets it accomplishes this through features like sequence numbers which keep track of the order of packets and the process known as the freeway handshake which establishes a stable connection between two devices in contrast UDP is faster but less reliable than TCP it doesn't establish a connection before sending data and doesn't guarantee the delivery or order of the packets but this makes
UDP preferable for time sensitive Communications like video calls or live streaming where speed is crucial and some data loss is acceptable to tie all these Concepts together let's talk about DNS domain name system DNS acts like the internet form book translating human friendly domain names into IP addresses when you enter a URL in your browser the browser sends a DNS query to find the corresponding IP address allowing it to establish a connection to the server and and retrieve the web page the functioning of DNS is overseen by I can which coordinates the global IP address
space and domain name system and domain name registers like name chip or gold Ed are accredited by I can to sell domain names to the public DNS uses different types of Records like a records which map The Domain to its corresponding IP address ensuring that your request reaches to the correct server or 4 a records which map a domain name name to an IP V6 address and finally let's talk about the networking infrastructure which supports all these communication devices on a network have either public or private IP addresses public IP addresses are unique across the
internet while private IP addresses are unique within a local network an IP address can be stated permanently assigned to a device or dynamic changing over time Dynamic IP addresses are commonly used for res idential internet connections and devices connected in a local area network can communicate with each other directly and to protect these networks we are using firewalls which are monitoring and controlling incoming and outgoing Network traffic and within a device specific processes or services are identified by ports which when combined with an IP address create a unique identifier for a network service some ports
are reserved for specific protocols like 80 for HTTP or 22 for SSH now let's cover all the essential application layer protocols the most common protocol out of this is HTTP which stands for hyper text transfer protocol which is built on TCP IP it's a request response protocol but imagine it as a conversation with no memory each interaction is separate with no recollection of the past this means that the server doesn't have to store any context between requests instead each request contains all the necessary information and notice how the headers include details like URL and Method
while body carries the substance of the request or response each response also includes the status code which is just to provide feedback about the result of a client's request on a server for instance 200 series are success codes these indicate that the request was successfully received and processed 300 series are redirection codes this signify that further action needs to be taken by the user agent in order to fulfill the request 400 series are client error codes these are used when the request contains bad syntax or cannot be fulfilled and 500 series are server error codes
this indicates that something went wrong on the server we also have a method on each request the most common methods are get post put patch and delete get is used for fetching data post is usually for creating a data on server puted patch are for updating a record and delete is for removing a record from database HTTP is oneway connection but for realtime updates we use web sockets that provide a two-way Communication channel over a single long lift connection allowing servers to push real-time updates to clients this is very important for applications requiring constant data
updates without the overhead of repeated HTTP request response Cycles it is commonly used for chat applications live sport updates or stock market feeds where the action never stops and neither does the conversation from email related protocols SMTP is the standard for email transmission over the Internet it is the protocol for sending email messages between servers most email clients use SMTP for sending emails and either IMAP or pop free for retrieving them imup is used to retrieve emails from a server allowing a client to access and manipulate messages this is ideal for users who need to
access their emails from multiple devices pop free is used for downloading emails from a server to a local client typically used when emails are managed from a single device moving on to file transfer and management protocols the traditional protocol for transferring files over the Internet is FTP which is often used in Website Maintenance and large data transfers it is used for the trans of files between a client and server useful for uploading files to server or backing up files and we also have SSH or secure shell which is for operating Network Services securely on an
unsecured Network it's commonly used for logging into a remote machine and executing commands or transferring files there are also real-time communication protocols like web RTC which enables browser to browser applications for voice calling video chat and file Shar sharing without internal or external plugins this is essential for applications like video conferencing and live streaming another one is mqtt which is a lightweight messaging protocol ideal for devices with limited processing power and in scenarios requiring low bandwidth such as iot devices and amqp is a protocol for message oriented middleware providing robustness and security for Enterprise level
message communication for example it is used in tools like rabbit mq let's also talk about RPC which is a protocol that allows a program on one computer to execute code on a server or another computer it's a method used to invoke a function as if it were a local call when in reality the function is executed on a remote machine so it abstracts the details of the network communication allowing the developer to interact with remote functions seamlessly as if they were local to the application and many application player protocols use RPC mechanisms to perform their
operations for example in web services HTTP requests can result in RPC calls being made on backend to process data or perform actions on behalf of the client or SMTP servers might use RPC calls internally to process email messages or interact with databases of course there are numerous other application layer protocols but devance covered here are among the most commonly used Bo and essential for web development in this section let's go through the API design starting from the basics and advancing towards the best practices that Define exceptional apis let's consider an API for an e-commerce platform
like Shopify which if you're not familiar with is a well-known e-commerce platform that allows businesses to set up online stores in API design we are concerned with defining the inputs like product details for a new product which is provided by a seller and the output like the information returned when someone queries a product of an API so the focus is mainly on defining how the crow operations are exposed to the user interface CR stands for create read update and delete which are basic operations of any data driven application for example to add a new product
we need to send a post request to/ API products where the product details are sent in the request body to retrieve these products we need to send the get request requ EST to/ API SL products for updating we use put or patch requests to/ product/ the ID of that product and removing is similar to updating it's again/ product/ ID of the product we need to remove and similarly we might also have another get request to/ product/ ID which fetches the single product another part is to decide on the communication protocol that will be used like
HTTP websockets or other protocols and the data transport mechanism which can be Json XML or protocol buffers this is usually the case for restful apis but we also have graphql and grpc paradigms so apis come in different paradigms each with its own set of protocols and standards the most common one is rest which stands for representational State transfer it is stateless which means that each request from a client to a server must contain all the information needed to understand and complete the request it uses standard HTTP methods get post put and delete and it's easily
consumable by different clients browsers or mobile apps the downside of restful apis is that they can lead to over fetching or under fetching of data because more endpoints may be required to access specific data and usually restful apis use Json for data exchange on the other hand graphql apis allow clients to request exactly what they need avoiding over fetching and under fetching data they have strongly typed queries but complex queries can impact server performance and all the requests are sent as post requests and graphql API typically responds with HTTP 200 status code even in case
of errors with error details in the response body grpc stands for Google remote procedure call which is built on http2 which provides advanced featur features like multiplexing and server push it uses protocol buffers which is a way of serializing structured data and because of that it's sufficient in terms of bandwidth and resources especially suitable for microservices the downside is that it's less human readable compared to Json and it requires http2 support to operate in an e-commerce setting you might have relationships like user to orders or orders to products and you need to design endpoints to
reflect these relationships for example to fetch the orders for a specific user you need to query to get/ users SL the user id/ orders common queries also include limit and offset for pagination or start and end date for filtering products within a certain date range this allows users or the client to retrieve specific sets of data without overwhelming the system a well-designed get request should be itm ponent meaning calling it multiple times doesn't change the result and it should always return the same result and get requests should never mutate data they are meant only for
retrieval if you need to update or create a data you need to do a put or post request when modifying end points it's important to maintain backward compatibility this means that we need to ensure that changes don't break existing clients a common practice is to introduce new versions like version two products so that the version one API can still serve the old clients and version 2 API should serve the current clients this is in case of restful apis in the case of graph Co apis adding new Fields like V2 Fields without removing old one helps
in evolving the API without breaking existing clients another best practice is to set rate limitations this can prevent the API from Theos attacks it is used to control the number of requests a user can make in certain time frame and it prevents a single user from sending too many requests to your single API a common practice is to also set course settings which stands for cross origin resource sharing with course settings you can control which domains can access to your API preventing unwanted cross-site interactions now imagine a company is hosting a website on a server
in Google cloud data centers in Finland it may take around 100 milliseconds to load for users in Europe but it takes 3 to 5 Seconds to load for users in Mexico fortunately there are strategies to minimize this request latency for users who are far away these strategies are called caching and content delivery networks which are two important Concepts in modern web development and system design caching is a technique used to improve the performance and efficiency of a system it involves storing a copy of certain data in a temporary storage so that future requests for that
data can be served faster there are four common places where cash can be stored the first one is browser caching where we store website resources on a user's local computer so when a user revisits a site the browser can load the site from the local cache rather than fetching everything from the server again users can disable caching by adjusting the browser settings in most browsers developers can disable cach from the developer tools for instance in Chrome we have the disable cache option in the dev Vel opers tools Network tab the cach is stored in a
directory on the client's hard drive managed by the browser and browser caches store HTML CSS and JS bundle files on the user's local machine typically in a dedicated cache directory managed by the browser we use the cache control header to tell browser how long this content should be cached for example here the cache control is set to 7,200 seconds which is equivalent to 2 hours when the re ested data is found in the cache we call that a cash hit and on the other hand we have cash Miss which happens when the requested data is
not in the cash necessitating a fetch from the original source and cash ratio is the percentage of requests that are served from the cach compared to all requests and the higher ratio indicates a more effective cach you can check if the cash fall hit or missed from the xcash header for example in this case it says Miss so the cash was missed and in case the cash is found we will have here it here we also have server caching which involves storing frequently accessed data on the server site reducing the need to perform expensive operations
like database queries serers side caches are stored on a server or on a separate cache server either in memory like redis or on disk typically the server checks the cache from the data before quering the database if the data is in the cach it is returned directly otherwise the server queries the database and if the data is not in the cache the server retrieves it from the database returns it to the user and then stores it in the cache for future requests this is the case of right around cache where data is written directly to
permanent storage byp passing the cache it is used when right performance is less critical you also have write through cache where data is simultaneously written to cache and the permanent storage it ensures data consistency but can be slower than right round cache and we also have right back cach where data is first written to the cache and then to permanent storage at a later time this improves right performance but you have a risk of losing that data in case of a crush of server but what happens if the cash is full and we need to
free up some space to use our cash again for that we have eviction policies which are rules that determine which items to remove from the cash when it's full common policies are to remove least recently used ones or first in first out where we remove the ones that were added first or removing the least frequently used ones database caching is another crucial aspect and it refers to the practice of caching database query results to improve the performance of database driven applications it is often done either within the database system itself or via an external caching
layer like redies or M cache when a query is made we first check the cache to see if the result of that query has been stored if it is we return the cach state avoiding the need to execute the query against the database but if the data is not found in the cache the query is executed against the database and the result is stored in the cache for future requests this is beneficial for read heavy applications where some queries are executed frequently and we use the same eviction policies as we have for server side caching
another type of caching is CDN which are a network of servers distributed geographically they are generally used to serf static content such as JavaScript HTML CSS or image and video files they cat the content from the original server and deliver it to users from the nearest CDN server when a user requests a file like an image or a website the request is redirected to the nearest CDN server if the CDN server has the cached content it delivers it to the user if not it fetches the content from the origin server caches it and then forwards
it to the user this is the pool based type type of CDN where the CDN automatically pulls the content from the origin server when it's first requested by a user it's ideal for websites with a lot of static content that is updated regularly it requires less active management because the CDN automatically keeps the content up to date another type is push based CDs this is where you upload the content to the origin server and then it distributes these files to the CDN this is useful when you have large files that are infrequently updated but need
to be quickly distributed when updated it requires more active management of what content is stored on the edn we again use the cache control header to tell the browser for how long it should cach the content from CDN CDN are usually used for delivering static assets like images CSS files JavaScript bundles or video content and it can be useful if you need to ensure High availability and performance for users it can also reduce the load on the origin server but there are some instances where we still need to hit our origin server for example when
serving Dynamic content that changes frequently or handling tasks that require real-time processing and in cases where the application requires complex server side logic that cannot be done in the CDN some of the benefits that we get from CDN are reduced latency by serving content from locations closer to the user CDN significantly reduce latency it also adds High avail ability and scalability CDN can handle high traffic loads and are resilent against Hardware failures it also adds improved security because many CDN offer security features like DDS protection and traffic encryption and the benefits of caching are also
reduced latency because we have fast data retrieval since the data is fetched from the nearby cache rather than a remote server it lowers the server load by reducing the number of requests to the primary data source decreasing server load and overall faster load times lead to a better user experience now let's talk about proxy servers which act as an intermediary between a client requesting a resource and the server providing that resource it can serve various purposes like caching resources for faster access anonymizing requests and load balancing among multiple servers essentially it receives requests from clients
forwards them to the relevant servers and then Returns the servers respond back to the client there are several types of proxy servers each serving different purposes here are some of the main types the first one is forward proxy which sits in front of clients and is used to send requests to other servers on the Internet it's often used within the internal networks to control internet access next one is reverse proxy which sits in front of one or more web servers intercepting requests from the internet it is used for load balancing web acceleration and as a
security layer another type is open proxy which allows any user to connect and utilize the proxy server often used to anonymize web browsing and bypass content restrictions we also have transparent proxy types which passes along requests and resources without modifying them but it's visible to the client and it's often used for caching and content filtering next type is anonymous proxy which is identifiable as a proxy server but but does not make the original IP address available this type is used for anonymous browsing we also have distorting proxies which provides an incorrect original Ip to the
destination server this is similar to an anonymous proxy but with purposeful IP misinformation and next popular type is high anonymity proxy or Elite proxy which makes detecting the proxy use very difficult these proxies do not send X forwarded for or other identifying header and they ensure maximum anonymity the most commonly used proxy servers are forward and reverse proxies a forward proxy acts as a middle layer between the client and the server it sits between the client which can be a computer on an internal Network and the external servers which can be websites on the internet
when the client makes a request it is first sent to the forward proxy the proxy then evaluates the request and decides based on its configuration and rules whether to allow the request modify it or to block it one of the primary functions of a forward proxy is to hide the client's IP address when it forwards the request to the Target server it appears as if the request is coming from the proxy server itself let's look at some example use cases of forward proxies one popular example is Instagram proxies these are a specific type of forward
proxy used to manage multiple Instagram accounts without triggering bonds or restrictions and marketers and social media managers use Instagram proxies to appear as if they are located in different area or as different users which allows them to manage multiple accounts automate tasks or gather data without being flaged for suspicious activity next example is internet use control and monitoring proxies some organizations use forward proxies to Monitor and control employee internet usage they can block access to non-related sites and protect against web based threats they can also scan for viruses and malware in incoming content next common
use case is caching frequently accessed content forward proxies can also cach popular websites or content reducing bandwidth usage and speeding up access for users within the network this is especially beneficial in networks where bandwidth is costly or limited and it can be also used for anonymizing web access people who are concerned about privacy can use forward proxies to hide their IP address and other identifying information from websites they Vis it and making it difficult to track their web browsing activities on the other hand the reverse proxy is a type of proxy server that sits in
front of one or more web servers intercepting requests from clients before they reach the servers while a forward proxy hides the client's identity a reverse proxy essentially hides the servers Identity or the existence of multiple servers behind it the client interacts only with the reverse proxy and may not know about the servers behind it it also distributes client requests across multiple servers balancing load and ensuring no single server becomes overwhelmed reverse proxy can also compress inbound and outbound data cache files and manage SSL encryption there be speeding up load time and reducing server load some
common use case cases of reverse proxies are load balancers these distribute incoming Network traffic across multiple servers ensuring no single server gets too much load and by Distributing traffic we prevent any single server from becoming a bottleneck and it's maintaining optimal service speed and reliability CDs are also a type of reverse proxies they are a network of servers that deliver cach static content from websites to users based on the geographical location of the user they act as Reverse proxies by retrieving content from the origin server and caching it so that it's closer to the user
for faster delivery another example is web application firewalls which are positioned in front of web applications they inspect incoming traffic to block hacking attempts and filter out unwanted traffic firewalls also protect the application from common web exploits and another example is SSL off loading or acceleration some reverse proxies handle the encryption and decryption of SSL TLS traffic offloading that task from web servers to optimize their performance load balancers are perhaps the most popular use cases of proxy servers they distribute incoming traffic across multiple servers to make sure that no server Bears Too Much load by
spreading the requests effectively they increase the capacity and reliability of applications here are some common strategies and algorithms used in load balancing first one is round robin which is the simplest form of load balancing where each server in the pool gets a request in sequential rotating order when the last server is reached it Loops back to the first one this type works well for servers with similar specifications and when the load is uniformly distributable next one is list connections algorithm which directs traffic to the server with the fewest active connections it's ideal for longer tasks
or when the server load is not evenly distributed next we have the least response time algorithm which chooses the server with the lowest response time and fewest active connections this is effective and the goal is to provide the fastest response to requests next algorithm is IP hashing which determines which server receives the request based on the hash of the client's IP address this ensures a client consistently connects to the same server and it's useful for session persistence in application where it's important that the client consistently connects to the same server the variance of these methods
can also be vited which brings us to the weighted algorithms for example in weighted round robin or weighted list connections servers are assigned weights typically based on their capacity or performance metrics and the servers which are more capable handle the most requests this is effective when the servers in the pool have different capabilities like different CPU or different Rams we also have geographical algorithms which direct requests to the server geographically closest to the user or based on specific Regional requirements this is useful for Global Services where latency reduction is priority and the next common algorithm
is consistent hashing which uses a hash function to distribute data across various nodes imagine a hash space that forms a circle where the end wraps around to the beginning often referred to as a has ring and both the nodes and the data like keys or stored values are hushed onto this ring this makes sure that the client consistently connects to the same server every time an essential feature of load balancers is continuous Health checking of servers to ensure traffic is only directed to servers that are online and responsive if a server fails the load balancer
will stop sending traffic to it until it is back online and load balancers can be in different forms including Hardware applications software Solutions and cloud-based Services some of the popular Hardware load balancers are F5 big IP which is a widely used Hardware load balancer known for its high performance and extensive feature set it offers local traffic management Global server load balancing and application security another example is Citrix forly known as net scaler which provides load balancing content switching and ation acceleration some popular software load balancers are AJ proxy which is a popular open-source software load
balancer and proxy server for TCP and HTTP based applications and of course Eng X which is often used as a web server but it also functions as a load balancer and reverse proxy for HTTP and other network protocols and some popular cloud-based load balancers are aws's elastic load balancing or microsof oft aure load balancer or Google Cloud's load balancer there are even some virtual load balancers like Vim ver Advanced load balancer which offers a softwar defined application delivery controller that can be deployed on premises or in the cloud now let's see what happens when a
load balancer goes down when the load balancer goes down it can impact the whole availability and performance of the application or Services it manages it's basically a single point of failure and in case it goes down all of the servers become unavailable for the clients to avoid or minimize the impact of a load balancer failure we have several strategies which can be employed first one is implementing a redundant load balancing by using more than one load balancer often in pairs which is a common approach if one of them fails the other one takes over which
is a method known as a failover next strategy is to continuously monitor and do health checks of load balancer itself this can ensure that any issues are detected early and can be addressed before causing significant disruption we can also Implement Autos scaling and selfhealing systems some Modern infrastructures are designed to automatically detect the failure of load balancer and replace it with the new instance without manual intervention and in some configurations the NS failover can reroute traffic away from an IP address that is is no longer accepting connections like a failed load balancer to a preconfigured
standby IP which is our new load balancer system design interviews are incomplete without a deep dive into databases in the next few minutes I'll take you through the database Essentials you need to understand to a that interview we'll explore the role of databases in system design sharding and replication techniques and the key ACD properties we'll also discuss different types of databases vertical and horizontal scaling options and database performance techniques we have different types of databases each designed for specific tasks and challenges let's explore them first type is relational databases think of a relational database like
a well organized filling cabinet where all the files are neatly sorted into different drawers and folders some popular examples of SQL databases are poster SQL MySQL and SQL light all of the SQL databases use tables for data storage and they use SQL as a query language they are great for transactions complex queries and integrity relational databases are also acid compliant meaning they maintain the ACD properties a stands for atomicity which means that transactions Are All or Nothing C stands for consistency which means that after a transaction your database should be in a consistent state I
is isolation which means that transactions should be independent and D is for durability which means that once transaction is committed the data is there to stay we also have nosql databases which drop the consistency property from the ACD imagine a nosql database as a brainstorming board with sticky notes you can add or remove notes in any shape of form it's flexible some popular examples are mongod DB Cassandra and redis there are different different types of nosql databases such as key value pairs like redis document based databases like mongod DB or graph based databases like Neo
4G nosql databases are schema less meaning they don't have foreign Keys between tables which link the data together they are good for unstructured data ideal for scalability quick iteration and simple queries there are also inmemory databases this is like having a whiteboard for quick calculations and temporary sketches it's fast because everything is in memory some examples are redies and M cach they have lightning fast data retrieval and are used primarily for caching and session storage now let's see how we can scale databases the first option is vertical scaling or scale up in vertical scaling you
improve the performance of your database by enhancing the capabilities of individual server where the data is running this could involve increasing CPU power adding more RAM adding faster or more dis storage or upgrading the network but there is a maximum limit to the resources you can add to a single machine and because of that it's very limited the next option is horizontal scaling or scale out which involves adding more machines to the existing pool of resources rather than upgrading the single unit databases that support horizontal scaling distribute data across a cluster of machines this could
involve database sharding or data replication the first option is database sharding which is Distributing different portions shards of the data set across multiple servers this means you split the data into smaller chunks and distribute it across multiple servers some of the sharding strategies include range based sharding where you distribute data based on the range of a given key directory based sharding which is utilizing a lookup service to direct traffic to the correct database we also have geographical charting which is splitting databases based on geographical locations and the next horizontal scaling option is data replication this
is keeping copies of data on multiple servers for high availability we have Master Slave replication which is where you have one master database and several read only slave databases or you can have master master application which is multiple databases that can both read and write scaling your data database is one thing but you also want to access it faster so let's talk about different performance techniques that can help to access your data faster the most obvious one is caching caching isn't just for web servers database caching can be done through inmemory databases like redies you
can use it to cat frequent queries and boost your performance the next technique is indexing indexes are another way to boost the performance of your database creating an index for frequently accessed column will significantly speed up retrieval times and the next technique is query optimization you can also consider optimizing queries for fast data access this includes minimizing joints and using tools like SQL query analyzer or explain plan to understand your query's performance in all cases you should remember the cap theorem which states that you can only have two of these three consistency availability and partition
tolerance when designing a system you should prioritize two of the is based on the requirements that you have given in the interview if you enjoyed this crash course then consider watching my other videos about system Design Concepts and interviews see you next time