When to Use Kafka or RabbitMQ | System Design

52.3k ยอดวิว1920 คำคัดลอกข้อความแชร์

Interview Pen

Visit Our Website: https://interviewpen.com/?utm_campaign=queues Join Our Discord (24/7 help): https...

บทถอดความวิดีโอ:

so in this video we're going to be talking about the difference between Kafka and rabbit mq or more broadly stream processing systems and traditional message cues with distributed systems a common mistake is thinking that these two systems are interchangeable but they actually solve very different purposes and using one of them when you should be using the other can cause a lot of problems down the road so let's take a look at the main differences in their design Kafka is inherently a stream processing system so it's designed for taking in a stream of events and sending

them off to a bunch of different consumers of those events kovka has very high throughput and it keeps all the mesages around until their time to live expires so even messages that have already been consumed can still be replayed later Kafka is also fan out by default so if we have multiple consumers connected to the same que each consumer will get one copy of each record traditional cues such as rabid m q on the other hand are designed to be message queuing systems so they're designed to take in messages cue them until they're ready to

be processed and then send them off to the processor traditional cues can handle complex message routing so this is really handy if we want to be able to route a specific message based on its properties to spefic specific cues traditional cues also generally have a message that is destined for one consumer so whereas Kafka is fan out with a traditional message Q if you have multiple consumers connected to one que each message will be routed to exactly one consumer not every consumer traditional message cues are designed for moderate data volumes they're still very fast but

they don't handle the same amount of throughput that Kafka does so those are some highlevel differences between how these two systems are designed but now let's dive into the lower level details of what makes that the way it is so the first thing we're going to take a look at here here is consumer patterns so as we mentioned before Kafka is fan out while traditional message cues are not so if we have for example three consumers connected to one q and we send three messages into that queue each consumer will receive all three messages with

rabid m q on the other hand each consumer will receive one of the three messages so Kafka and Rabbid mq both support doing this the other way around as well but it requires a little bit more setup and isn't quite as scalable so the Kafka pattern of Fanning out is really useful for things where we have events coming in and we want to distribute them to multiple dependent services so for example we could have a stream of events coming in and we want to send those out to logging we want to do some data analytics

on those events and we want to send real-time updates to our users each event that's coming in needs to be processed by each one of these Services however the individual job of these Services is relatively small and likely doesn't need to be scaled horizontally too much this means that most of our messages are simply Fanning out to each of these three consumers rabid mq on the other hand is great for situations where we have messages coming in and we need to process those messages so in this example we have two processors and any message that

comes in needs to be read by one of these two processors we could expand these processors out to hundreds of replicas and we wouldn't really have any problems here because rabid mq will handle Distributing each message to exactly one of those processors if you want to see some more examples of how distributed cues are used in real world situations you should check out our systems endtoend course on interview pen.com where we have a ton of in-depth examples of how these systems are used all right so the next thing we're going to take a look at

here is message routing so with Kafka all message routing is handled by the producer so a kovka setup can consist of multiple cues organized into topics and partitions and the producer is solely responsible for determining which queue its data goes into this has the advantage of the kfka cluster itself not having to do a lot of work which is part of what allows it to handle such high throughput on the producer side we're able to send messages to one or many cues based on properties of that message and if we have a situation where we

don't want to Fan out and we instead want to have multiple partitions and multiple consumers that each get one message we can do that as well on the producer Side by having the producer hash the message to determine which partition it goes into this also enables us to scale our Kafka cluster better because there's no single point where all messages have to go through to be routed one of the problems with this however is that we have no control after the message has been produced of where it actually goes rabid m q on the other

hand introduces exchanges which take in all of the messages and Route them to different cues this exchange can do things like routing a message to one of two cues based on properties of that que and it can also handle duplicating messages between multiple cues to enable a fanout style approach what's nice about this is our consumers now have control over what messages they're consuming from this queue so in a situation where we're not Fanning out this enables us to balance the load between multiple consumers much better especially for tasks that might have variable degrees of

time associated with them so let's look at what this actually means for whether or not you should use Kafka or Rabbid mq Kafka is designed to take in uniform messages that require a short time to process so these are things like streaming events where we want the data to go to multiple places at once and decouple systems but there's not a huge cost associated with actually processing a single message kfka does well when messages are being fanned out to many systems so if we have multiple independent systems that are looking at the same pieces of

data kfka is a good fit kfka is also very very fast so if we really need extremely high throughput kfka is going to be a necessity now traditional message cues like rabbit mq on the other hand are great for long running tasks that we don't know how long they're going to take to complete traditional cues can also handle complex routing really well and this can be useful for certain situations traditional cues are also really good with sporadic or bursty data flow Kafka is designed to have consistent data moving through the system at all times and

the traditional message CU model tends to work well when we don't have that now the final piece that we're going to talk about here is acknowledgement so if something goes wrong and a consumer fails to process a message we need some way to be able to retry and send that message off to a different consumer so that's where acknowledgements come into play so with the Kafka model we don't actually have acknowledgements we instead have offsets so Kafka logs an offset of how many messages each consumer has received so far and whenever a consumer needs new

data it fetches data from the queue from that offset so it's getting all new data from the last position it read from once it's done processing that batch of data it then commits its offsets to tell Kafka that it actually successfully processed that data if a consumer disconnects before it commits its offsets kofka will be able to automatically send that data to another consumer because that consumer will just pick up from the last committed offset rabbit mq on the other hand actually does have acknowledgements so a consumer's just just going to pull the queue for

new data and when it finishes processing that data it sends an acknowledgement back to the queue to tell it that it successfully processed that record rabit mq will actually just wait for an acknowledgement and if it doesn't receive one in a certain period of time then it'll go and send that data out to another consumer so these two approaches accomplish a similar goal but the traditional rabbit mq model tends to work better when we have long running tasks and we need to acknowledge those tasks as completed or failed the model of committing offsets with Kafka

is great when we have to process batches of data when we have a large quantity of small events coming in a lot of systems that people use kfka for can also tolerate messages being dropped to some extent so to recap let's take a look at some use cases and whether or not you would use Kafka or Rabbid mq for those use cases so Kafka being a stream processing system is really good at stream data analysis so if we have a stream of real-time data coming in and we want to do some analysis on that data

as it comes in there's a lot of tools that are able to do that using Kafka kfka is also great for the event bus model where we have events coming in from different pieces of a system and we want to send those events out to multiple different independent systems that all want to capture that data Kafka is also used a lot for logging where we have a stream of logs coming into our system and we want to capture those logs in a database and maybe do some other processing on them in different systems this is

a good example of where we have a consistent stream of a lot of small data coming in and finally Kafka is good at real-time communication so for when we want to stream events to our users in real time traditional cues are great for when we have messages that actually do need to be cued so for example example a job worker system so we might have a cluster of a ton of different workers that are all processing jobs and we want to cue those messages and allow the workers to process them one at a time as

they're ready traditional cues are also great for decoupling microservices when we just need to have simple communication between two different Services rabid mqy will handle all of the error handling and scaling challenges associated with that behind the scenes if you enjoyed this video you can find more content like this on interview pen.com we have tons of more in-depth system design and data structures and algorithms content for any skill level along with a full coding environment and an AI teaching assistant you can also join our Discord where we're always available to answer any questions you might

have if you are a friend wants to master the fundamentals of software engineering check us out at interview pen.com

วิดีโอที่เกี่ยวข้อง