Streaming Audio: Apache Kafka® & Real-Time Data

Disaster Recovery with Multi-Region Clusters in Confluent Platform ft. Anna McDonald and Mitch Henderson

August 17, 2020 Confluent, original creators of Apache Kafka® Season 1 Episode 115
Streaming Audio: Apache Kafka® & Real-Time Data
Disaster Recovery with Multi-Region Clusters in Confluent Platform ft. Anna McDonald and Mitch Henderson
Show Notes Transcript

Multi-Region Clusters improve high availability in Apache Kafka®, ensure cluster replication across multiple zones, and help with disaster recovery. 

Making sure users are successful in every area of their Kafka deployment, be it operations or application development for specific use cases, is what Anna McDonald (Team Lead Customer Success Technical Architect) and Mitch Henderson (Principal Customer Success Technical Architect) are passionate about here at Confluent.

In this episode, they share common challenges that users often run into with Multi-Region Clusters, uses cases for them, and what to keep in mind when considering replication. 

Anna and Mitch also discuss consuming from followers, auto client failover, and offset issues to be aware of.

EPISODE LINKS

Tim Berglund (00:00):
There are various ways of running Kafka across multiple regions and various reasons why it's hard to do. I got Mitch Henderson and Anna McDonald back on the show this time together to tell us about Confluent Platform Multi-Region Clusters and how it solves those problems. We dig into a lot of Kafka goodness, too. On this episode of Streaming Audio, a podcast about Kafka, Confluent, and the cloud.

Tim Berglund (00:30):
Hello, and welcome back to another episode of Streaming Audio. I am your host, Tim Bergland and I am, I think, particularly delighted today to be joined in the virtual studio by two guests and McDonald and Mitch Henderson. Anna, Mitch, welcome to the show.

Mitch Henderson (00:45):
Thank you, Tim.

Anna McDonald (00:46):
Thank you very much, Tim. Hello.

Tim Berglund (00:47):
Yeah, I just realized, I say welcome you both to the show at the same time. Um, there's a media access control problem that I introduced there. Right? There's you had to contend for access to the shared medium and you both kind of did this delay thing. Um, so that's, I think that's a good lesson already. It means you're like we are, it means you're more, you're more polite than Ethernet, right? Ethernet would be like go and if there's a collision you back off and you know, all that kind of thing, we're like IPX SPX collision avoidance.

Tim Berglund (01:21):
Yeah. Yeah. There you go. Yeah. You want to try to avoid the collision rather than just be like, Hey, this medium is mine. Oh, wait. It's not on, I'll wait for it sometime.

Anna McDonald (01:29):
I'm just surprised that I was allowed back on. So I think maybe I'm just a little awestruck, yeah.

Tim Berglund (01:35):
No, you're definitely allowed back on this show that.

Mitch Henderson (01:39):
How could you not be all struck by Tim? I'm always awestruck by Tim.

Anna McDonald (01:43):
True story, hashtag yes.

Tim Berglund (01:46):
You guys stop. Anyway. So definitely we will have a link in the show notes to Anna's previous appearance, which was at the very beginning of her employment at confluent. And, uh, it was Halloween themed episode and it was frankly amazing. And Anna, you were the creative impetus behind that episode. I think you need full credit for that. I hope you got that in the original show, but that's just how it is.

Tim Berglund (02:11):
But what we're going to talk about today is a feature of confluent platform called multi-region clusters, uh, affectionately known as MRC. Uh, and I was like to put that upfront, uh, not so that you can stop listening because well, they're talking about some lame commercial thing and it's just going to be a big advertisement. Cause that's not really how we do it here. Um, but we are talking about it, uh, straight up proprietary feature, right? This is the thing that you can get if you buy confluent platform for money. Um, but it is very interesting and in dissecting it, which you two are quite capable of doing, um, you learn about, uh, just the way Kafka functions across regions and across data centers. I'll just say data centers for short. Um, there really is a lot of cool stuff to dig into here. So, um, I think you're gonna find the conversation interesting.

Tim Berglund (03:08):
If you are a person who is offended by the fact that there are commercial products, you should still listen because these two are fun and they're good. They're smart. They're going, gonna talk about great things. Um, even though this is a commercial feature, it is built upon kind of the idea of a stretch cluster, which is a bit of opensource. Yeah, exactly. Which we're going to, we're going to talk about what those are. Uh, but first we need to know a little bit more about you two. Um, Anna and Mitch, you are both, um, technical account managers at Confluent. Uh, what does a Tam, do?

Anna McDonald (03:43):
We have a new name now? Should we, should we say that? Yeah.

Tim Berglund (03:47):
Okay. Yeah, right there, there are new acronyms. Go ahead and drop that.

Anna McDonald (03:52):
It's called siesta, which is the only reason why I really like it because it, it makes me laugh. No, it's, uh, it's, you're gonna, I'm gonna mess it up. It's a customer success technical architect. Is that right? Mitchell?

Mitch Henderson (04:05):
That is correct.

Anna McDonald (04:06):
Yes. What you get a cookie,

Tim Berglund (04:09):
Okay, you're customer success, technical architects. Um, that's pretty cool. What do you do?

Mitch Henderson (04:15):
So if you listen to the podcast, I did, we actually talked about this one.

Tim Berglund (04:20):
That's right. You talked about all about being a Tam that's okay. Worst test ever. Mitch, I forgot that. But, uh, we're also going to link to that.

Anna McDonald (04:29):
Was it? It wasn't a forgettable episode. Sorry. Hashtag bro.

Tim Berglund (04:35):
A little bit. Yeah. But, uh, remind, uh, quick Mitch quick summary. Um, what does a customer success technical architect do?

Mitch Henderson (04:43):
Yeah. Like Anna pointed out the, the name siesta is really cool, right? It, uh, we, instead of saying, we keep you from like messing up. Now we can say, we let you take a nap or we let you take a seat. So our job is to work with a wide variety of, of customers of Confluent and make sure that they're successful in all parts of the product, whether it's the operations part, the application development part, just, you know, not being woke up at 3:00 PM or 3:00 AM because, uh, something happened

Anna McDonald (05:18):
That's right. You're a napper.

Mitch Henderson (05:20):
I am a napper.

Tim Berglund (05:21):
Right.

Anna McDonald (05:21):
That's right.

Tim Berglund (05:22):
I am also a napper and there's no shame in that. Okay. So that's good. You help people who are, who are, you know, they have serious deployments, businesses relying on them. They can't go down. Um, and they may not have all the expertise to keep everything running perfectly all the time, reach out to you guys. And you do so you are people who are good at managing stressful situations and high-stakes situations and staying chill.

Anna McDonald (05:47):
The, um, I would like to just say since, since the last time I was on the show, um, I, my favorite thing about being a siesta is getting to see all of the use cases because we also help people before that with like, you know, application design event, streaming, how to like, you know, come into the new, beautiful, wonderful era that is, you know, as close to real time as possible just to be accurate and event streaming. So it's been super fun to see the way that people are using Kafka. Um, you know, since my time here now.

Tim Berglund (06:19):
Yeah. And that's the thing by the way that I enjoy hugely, um, it's happened a lot less in the pandemic. Uh, but when I get to go and meet with customers in person, you know, usually the group of architects and like, I don't know what their use case is going to be. It's here's this new thing. And is it a slam dunk? And it perfectly adapts itself to event streaming in all ways. And it's very obvious, usually not right there. There's interesting edges that people's businesses, they didn't design their business to adapt to event streaming. And so you have to do that adaptation. Uh, and I also love doing that. That's just a really, really fun thing. It feels, it feels risky. You know, when you begin the discussion, you're struggling to understand what their use cases and wondering, wow, do I even have answers for this person? But you know, you come up with them.

Mitch Henderson (07:09):
That that is one of my favorite things seeing or helping someone, you know, integrate to an existing system and make it real time so that it's not, Hey, I've got an update 24 hours later, but how do I integrate with this old mainframe thing and make it, so it is real time and then improve the processes around that so that all of the, you know, the streaming inventing and all the fun things that Kafka brings get done.

Anna McDonald (07:35):
I also like it when we get hoodies, I've gotten some free hoodies. I'm excited about that.

Tim Berglund (07:43):
Yeah, no, I'm actually not short on hoodies.

Mitch Henderson (07:46):
T-shirts bobbleheads, whatever you got.

Tim Berglund (07:49):
Bobble bobbleheads for sure. Uh, ball caps, stickers, um, hoodies. I actually cleaned my office recently. I'll be donating some hoodies, uh, to a charity and I got like a, a, still a reserve supply of them here. Anyway. Tell us about, um, and you know what, uh, since I'll call out a person to answer each question, and if you don't like that, just pass the ball to the other person, but that way you don't have to go through the media access protocol. Every time I ask a question, I'll just, I'll try and balance them myself, but Anna, what is MRC? What is a multi-region cluster? And if I could just stack the questions up, um, I think miss, you mentioned stretch clusters. Also tell us what a stretch cluster is and tell us how they're different.

Anna McDonald (08:34):
So there's yeah, so there's, there's a couple components like a stretch cluster, right? As the name might imply is a cluster that's stretched over to, if something's be too easy, they can be two DCS. They can be two things that give you a semblance of resiliency, um, as opposed to running two separate clusters and having some sort of replication between them. Those are kind of like the two patterns in DR. Um, and MRC at its core is a stretch cluster that has some pretty interesting additions on there that kind of help you to at an operations level, ease the burden of deciding how do I have to map out my partitions? My, you know, replicas all of these things in order to ensure resiliency and Mitch, um, like right from the gate has done a ton of work with us. So I'm going to kick that over to Mitch to describe, um, exactly how that helps you.

Mitch Henderson (09:33):
Yeah, thanks. So as Anna was talking about, you know, this is kind of a, a superset of, of the traditional stretch cluster. If you've been around the coffee community for awhile, you've heard of stretch clusters where we, you know, put five brokers and five brokers brokers in one DC, five brokers, another DC, and stretch the zookeeper ensemble across them. And then they're synchronous what that means. There's some, some nodes of the zookeeper ensemble in the first easy availability zone and some in the other. Correct. So the way that zookeeper works and the way that, uh, kind of what defines a stretch cluster is that it's, it's one Kafka cluster with really inconvenient physical locations.

Tim Berglund (10:19):
And therefore latency.

Anna McDonald (10:21):
That seems a rather negative attitude.

Tim Berglund (10:23):
Well, I don't know that we need that kind of negativity. Anna.

Anna McDonald (10:27):
I feel like I know, you know what.

Mitch Henderson (10:29):
Here it is. Uh, maybe, maybe they're just slightly inconvenient physical locations.

Tim Berglund (10:35):
And actually, if I may, uh, press you, um, mr. Henderson, what what's, what goes wrong? I can imagine, you know, there's latency there, but is, is the problem in the brokers is the problem with the clients. It's probably ZooKeeper. What's so bad about a stretch cluster.

Mitch Henderson (10:52):
Yeah. So the bad part about a storage cluster is that way in, in the middle, you know, if you've studied distributed systems for any amount of time, network links are generally the, the weak point and anybody that's ever worked at the Wayne link knows that their bandwidth constraint they're high latency. And they're, they're kind of prone to failure a little bit, right? You can, you can buttress them up with, you know, MPLS and all kinds of dual links and all these things. But in the end of the day, that tends to be less reliable than your local data center way land, uh, on the operation side for Kafka. What this really means is that it's a little bit harder to manage, right? With Kafka. You have to decide where those replicas are placed. How many partitions are in each data center, how many rough cause they're in each data center where my clients are connecting to for, for leaders, because that's primarily where we write to, or that's the only place we write to. And then, uh,

Mitch Henderson (11:54):
The, uh, some of the other things to consider are where my consumers are located. And those have always been some of the challenges with a stretch cluster.

Tim Berglund (12:04):
Cause you likely don't want—

Anna McDonald (12:08):
Well, as soon as you say, there's things you can do that ease, that burden, right? Like fetch from followers. So there's been work done to make that the penalty you pay for consuming, you know, way, way far away, a little bit easier, but it does require additional configuration, which as we know, you know, it's great when things just work out of the box. So complexity, I think, is there, even if you can ease some of the burdens is to increase complexity,

Tim Berglund (12:32):
Correct. Right. And fetch, fetch, let's just take these and go one at a time. I want to make sure that the, whatever, you know, whatever downsides, I guess I want to lean into niches, negativity, whatever downsides there are of stretch clusters. I want to make sure they're on the table. So a few things we've got, um, Mitch, you mentioned replica placement and I guess leader placement is the thing there. So my cluster, uh, exists over two data centers and clients are, and I'm just going to say data center instead of AC, just, just to generalize it because it could be on prem and they could be my data centers. There's certainly data centers, whatever they're, you know, whether they're managed by me or not, I'm their two locations. And my cluster is, is stretched across those two locations and ZooKeeper is managing life. Okay. But my clients we're going to assume are also in those two data centers now, before I go on, is that fair?

Tim Berglund (13:32):
That's okay. Right there. They're probably in one of those two datacenters?

Mitch Henderson (13:34):
That is an accurate statement.

Tim Berglund (13:37):
Okay. And each client is probably going to want to talk to, uh, a partition leader in its nearby data center, because if it stinks to replicate across that wan link, and even if the wan link never fails, it is at least high latency. Right. So let's, let's pretend it, it doesn't even fail for our scenario. It just has latency characteristics. We don't like, um, application performance is probably going to be, uh, a real problem. If I am going over the wan to talk to, uh, a replica leader, is that basically that problem with, you know, you said you have to administer this where the, uh, where replicas are placed and that's kind of the issue, right?

Mitch Henderson (14:16):
Absolutely correct. Okay. It gets a little bit more nuanced. And I think we're going to talk about that here in a second, especially when you start talking about producers versus consumers.

Tim Berglund (14:26):
Got it. Um, ah, yeah.

Tim Berglund (14:28):
Right.

Anna McDonald (14:29):
Yeah. There's no, yeah, right. That there's consumers right there. You have options. Producers, you got nothing you have to produce to the leader.

Tim Berglund (14:38):
Right.

Tim Berglund (14:40):
Um, and consumers, you have having had an option. That feature is less than six months old. Right. That's I don't remember the KIPP number, but I think it was AK 2.4 to 0.2 0.4

Anna McDonald (14:52):
Six months ago. That's so long, my goodness.

Tim Berglund (14:55):
Right. How could I remember?

Anna McDonald (14:56):
It is relatively new.

Tim Berglund (14:58):
Six months.

Tim Berglund (15:01):
Yeah. It feels like a long time ago. To me.

Anna McDonald (15:04):
It does.

Tim Berglund (15:06):
Yeah. That we knew of then every day. So, yeah. Alright. So that exactly with that KIPP in place, uh, and I mentioned we can get into that more and we can get more into that in a minute. Uh, that gives us options for consumers, but producers have to produce to the, the leader. And it's a bummer to do that across the wan. What else goes wrong? Is there any, any gotchas, like in terms of replication, does that just kind of work and, and, you know, writing with ax equals all is a little slower. What, what's the deal there?

Mitch Henderson (15:42):
So can I erase some of my earlier negativity?

Tim Berglund (15:45):
You sure can.

Mitch Henderson (15:47):
So this is one of the big benefits of MRC. It uses Kafka's internal replication protocol. It doesn't require a connect node or a mirror maker or anything like that.

Mitch Henderson (15:58):
It just uses Kafka. So the, the, the replication protocol we know in love and we know is battle-tested and works, and we know the characteristics of it and how it's going to behave that we inherit that. So whenever I'm replicating from DC, one to DC, two, I can absolutely know about how that replications going to happen and what settings affect how quickly that replication happens.

Tim Berglund (16:24):
Got it. Now we're still talking about stress clusters. Can you elaborate Anna on reading from followers consuming from followers? What's the deal with the KIPP number Mitch gave, which I already forget in the course of this conversation, but just, just drop it on us.

Anna McDonald (16:44):
So, yeah, I mean, and it makes sense if you think about it. So as long as you have, you know, Kafka, when you have a, a partition as leaders and his followers, and there's a list that says, okay, here's all of my followers that are in sync, right. Which means across the board, the offset latest one that's available to use, it's the same high watermark. We love it. We know it. It's fantastic. And so there's not any problem in terms of consistency with a consumer that consumes from an Insync replica, right? It's no different than consuming from the leader. It also, you know, side benefit a little bit, maybe, you know, spreads the load on whatever node is currently the leader for that petition a little bit. Um, and so it makes sense. So if you've got two DCS, one in California, let's say it's covered in avocado, not a fan, the other one's in New York, fantastic pizza and Italian food.

Anna McDonald (17:38):
Not that I'm, you know, in any way biased and you're, that's Western New York, my home and love of my life. So yeah, let's, that would be even more tasty. Um, but I digress cause I'm hungry. So if you're in New York, you really don't. And the leader for the partition, you're trying to consume as in California, you want none of that avocado. It's not tasty, it's far away. And they put it on everything you want to consume from New York, which has, you know, good food and straightforward people. So what you end up doing, you say, okay, I'm going to place my replica, such that I know there's availability in the New York, DC for these consumers to fetch from a follower. And that way they don't have to go over to California and lose five pounds. Right. They can stay in New York, they can consume from an Insync replica and it kind of sells a lot of those issues on the consumption side, um, with latency. Does that make, does that make sense and make people hungry helps?

Tim Berglund (18:37):
Yes, it does. You, you want to consume from nodes closest to you basically.

Mitch Henderson (18:43):
And there's one other side benefit there and that is the cost of WAN bandwidth is incredibly expensive.

Tim Berglund (18:50):
Ah, yes.

Mitch Henderson (18:51):
The less data I have to pull over that way and the better I'm going to be off when it comes time to pay the bill.

Tim Berglund (18:57):
Yep. This is the hidden, uh, you know, where they get ya in cloud pricing is egress costs, right? If, if network has, if traffic has to leave their network, uh, it's going to hurt you. Absolutely. Now, uh, now that I think we kind of know what a stretch cluster is and, uh, Anna, you explained why I'm reading from followers is a good idea. You know, you, you want to lead, you want to read from the broker that's as close to you as possible. That's not over the wan connection.

Tim Berglund (19:29):
And the ability to lead from a follower means that the partition leader can be in the other data center. And you can read from a follower that's in your data center and love life in terms of latency. You're not going to pay that, that latency penalty. Uh, also your that'll, I didn't think of that, but you'll save on egress costs. You won't be traversing the land, like when link, um, Mitch. So good point now, how is MRC different? And I will let this be a jump ball for the two of you. Now that we kind of know what a stretch cluster is once I'm Marcy.

Mitch Henderson (19:59):
I'll go first and I'm sorry to be rude. I just thought we'd go operations first and kind of the initial install and setup of, of a topic. Um, so what MRC really does is it provides an easy way to set up a Kafka cluster so that it takes advantage of all the things in stretch cluster and makes it operationally easy. Right? Those things that we talked about, like replica placement, uh, fetched from follower, uh, you know, just the whole kind of install configuration, get me going and get me writing an app. All that's taken care of for you in a much easier, uh, much more clearly defined way with MRC.

Tim Berglund (20:43):
Got it. Is the benefit primarily operational?

Anna McDonald (20:48):
Well, no, yes and no.

Tim Berglund (20:51):
Okay. There's a significant operational benefit.

Anna McDonald (20:56):
I'm just biased. I think the core MRC features

Mitch Henderson (21:00):
Like what makes MRC MRC are aimed pretty much at the operations folks and making their lives easier, but making their lives easier. It means that we also make the application developers lives much, much easier. So it's a, it's a win, win, win.

Tim Berglund (21:18):
Right. So what does MRC to... what does MRC do to place those replicas? Can you kind of walk me through that, that just in terms of describing the operational magic that it's doing, that I don't have to think about, I'd like to at least be able to admire it once. So my personal favorite feature of MRC is default replica placement. What this does is it allows the applications to just, you know, instead of saying, I need a four, four rep or three replicas and letting the brokers kind of just take over rack awareness and letting that do what it does and getting maybe not the optimal and probably not the optimal replica placement.

Tim Berglund (22:02):
It allows that, that responsibility of the application folks to just say, I don't know what to do. The brokers already probably know what to do, let them decide. So you can say on the brokers put two replicas in DC, one, two replicas in DC two, and that's all taken care of you. So you get the optimal HA strategy for your replica placement.

Tim Berglund (22:28):
And that's a, that would be topic level of metadata that you, where you put that configuration. Uh, no it's actually defined. You can define it both at the topic and you can define it as a default replica placement. So all,

Tim Berglund (22:44):
Yeah. That's money for any new topic.

Mitch Henderson (22:46):
Exactly.

Anna McDonald (22:47):
That's my favorite thing too.

Mitch Henderson (22:49):
For the, for the teams out there with a, you know, a multitenant cluster, they don't have to go to each application team as they onboard and say, what are you trying to do? Where do you need to wrap the cause? Here's why you're up at this.

Tim Berglund (23:01):
Explain this thing called MRC to you. It's really cool. No, wait, why won't you listen? Right? You don't, you don't want to have to do that.

Mitch Henderson (23:08):
Correct. You just set this on the broker and, you know, you tell your app teams to, instead of saying, I want one replica or three replicas, you just tell them to set it as negative one replicas and the broker defaults takeover.

Anna McDonald (23:24):
Yes. And so that's, that's also just for listeners at home. Um, the reason you can do that is cause of kit four, six, four, which is a beautiful thing last modified by Mathias, stay sex, shout out. And the cool thing is now you're able to pass in negative one to the admin client, which allows you to just leave up to the broker. That's something that works in Kafka streams. So for people out there who are used to having to go in and define internal topics and Kafka streams, you can just say negative one and your replica's will be placed across a stretch cluster exactly. As the broker defaults are, which is huge because anyone who is a Kafka streams affects you. Natto, obviously you love it. Um, you know, internal topics get created and there can be lots of them. And having to kind of like either open a ticket every time, or try to rely on a less than optimal placement than, you know, move them after. That's just really hard and not workable and not fun. And so this is amazing because it just basically allows as, you know, whatever your operation team determines is the best placement as an application developer, you can go, okay, and just put negative one and roll with it.

Tim Berglund (24:39):
Got it.

Mitch Henderson (24:40):
It's also important to point out. It's not just Kafka streams. It's part of the admin client now,

Anna McDonald (24:46):
Would we say it's important? And I did say admin client.

Mitch Henderson (24:48):
Yes.

Anna McDonald (24:49):
And I gave you, and I gave the Kip number, just a palindrome.

Mitch Henderson (24:53):
You can also thank Randall for getting into connect as well as you know, the rest of the class.

Anna McDonald (24:58):
Hi Randall. Okay. If I'm a shoutout too.

Tim Berglund (25:01):
Yeah. But anything, anything that's going to create topics and Anna, like you said, Kafka streams most certainly going to create topics. Then, um, you get that, that auto replical placement strategy and that's, uh, that's not making any commitment to where the leader is. Right. The leader election happens. And is, does leader election easily to election influenced by MRC? I guess I should ask. Um, but, but merely saying, I want this sort of placement, doesn't say where leaders go, right.

Mitch Henderson (25:31):
Preferred leader election still takes precedence. You can still say, as part of this, I prefer all of my leaders to be in this data center. Excellent. Which that is something that the application developers have to think about a little bit, at least where they deploy their apps.

Anna McDonald (25:48):
But yeah. And I would think, you know, that that's a, there's a couple of patterns you see. Um, sometimes there's a pattern where you have application teams deployed on, you know, the easiest way to talk about it as like coast to coast. Right. Because then it really does matter in terms of like latency. So there's some places that say, okay, yeah, we want a backup on the other coast, but we don't really have any, you know, consumers or producers or applications deploy there. We just want to back it up. So in that case, they'll centralize all their leaders in one DC. Yeah. Then there's very obvious preferred leader. Exactly. But then there's other places where they're like, well, this, you know, this business case or use case these applications right there, they're running over here on the West coast. Well, this one's running in the East coast. Right. So, and you have the flexibility to create, you know, different strategies if you need to write, just because you can use default reckless replica placement doesn't mean you have to, if you're, you're better served using a custom one. Got it. So for some, yeah. You know what I mean, some topics you could say, I want the leader over here, others, you maybe want it over here depending on, on where your apps are running.

Tim Berglund (26:49):
Absolutely. And that's, that's one of the reasons we made this default replica placement over rideable by the clients. So the clients can still override this as they deploy their apps. But if they don't know, go with the default as you should. All right. You and Anna, you were just starting to get into this, but I'll ask you, and again, you guys can throw the question back and forth as you like. Um, what are the cases that you see where this really, or the, the value of this thing is most obvious. Uh, and I just want to say, I feel like that's a little bit of a dorky question. Cause it seems self evident to me that you'd want disaster recovery. Um, but so I'm trying to word it specifically, like who really cares about this?

Anna McDonald (27:33):
Well, no, I don't think that's silly at all because it's expensive and if you don't need it, like basically it comes down to two concepts, right? What's your recovery time objective. And what's your recovery point objective. And I do this in my cough. Cause I'm gonna talk, I'm gonna do it again because I just thought of the best analogy for it. So let's see, like you're at home and you're watching a live stream. That's like a reunion of the murder she wrote cast. Right. Which would be the coolest thing ever. I know there won't be the previous show. Well, no, it's a live stream, right? Yeah. I just picked it out of the air. I don't got no special affinity there. So basically you have to ask yourself if the power goes out, how long can you stand it going out? Right. 10 minutes an hour.

Anna McDonald (28:17):
Not at all. If not at all right, you need a backup generator to take over. So that's like your recovery time objective, how long can you afford to be down? And then your recovery point objective is how much of this live stream can I afford to miss like 10 minutes, you know, 20 none do I have to pick up right where I left off. Um, and many times for many use cases, those trend towards zero. Right? Um, and in that case, they, you know, if you looked at having two separate clusters, right, you're going to have to, you know, manually fail over applications. So it's gonna take time. Right. Um, and in a case where your RTO RPO are really close to zero, that drives people. I find the most to stretch clusters because you can get that with stretch clusters cause to clients it's just one big cluster.

Anna McDonald (29:10):
So as long as you're contributing properly.

Tim Berglund (29:13):
Recovery time, objective, recovery point objective?

Anna McDonald (29:16):
Correct. Yup. Yup. Yup. And so, you know, those should really drive your requirements without knowing those. You really shouldn't make any decision whatsoever about your, your dr. Strategy. Um, because that's really, if you don't need those to be zero, then you have other options. You, you know, you can explore it. And I always liked that because if you think about it, right, there are times where you have a batch job and it fails and you don't care. Like you'll restart it the next night. So I think in our industry, sometimes we pretend that there's like no cases where it's okay for something to fail. And that's clearly false. There are many cases. So, you know, always it's a good, good thing to start with what, you know, what are your requirements?

Tim Berglund (29:57):
Okay. Yeah. And if there's zero, then, then you know, you're going to

Anna McDonald (30:00):
Close to zero.

Tim Berglund (30:02):
Yeah. Of course.

Anna McDonald (30:03):
Let's be mathematically accurate little limit as it approaches zero. How about that?

Mitch Henderson (30:08):
Whenever I'm out talking to folks about this one, I don't even call it dr. At this point, it's just highly available. Right? Your, your electric, again, it was talking about there is highly available, right? If a substation goes down, it gets rerouted through a different substation.

Tim Berglund (30:21):
And I think Mitch Contra your negativity early in the interview, that's a much more positive way to frame it. You know, it's not that it's not that there's a disaster. It's just, you want to be available. I appreciate that. Yeah. Anna, you said a client fail over and I want to drill down into that. It's again, potentially an obvious thing, but let's walk through something. So, um, I am a client. I don't care whether I'm a producer or a consumer. I have a connection pool. And as, um, leader election proceeds due to whatever mechanisms might drive leader election, uh, I will get metadata updates from the cluster about where leaders are.

Tim Berglund (31:06):
Right. Good. So far.

Anna McDonald (31:08):
Yup.

Tim Berglund (31:08):
Um, and I have, I have, uh, you know, good old fashioned socket connections to broke between the client and the broker. Those are pooled internally inside the client library. There's all this stuff going on inside the client, right? The API is the API surface is so small, but there's great deal of plumbing happening there, but there's this connection pool. Uh, the bummer is when there is a metadata change, but I don't know. Right. When a broker actually fails, uh, I had a connection to the old broker and I'm trying to say, produce a record to that broker. And there's going to be, I suppose, a TCP timeout followed by some sort of exception that, uh, gets thrown and, and bad things happen. And there's some period of many seconds over which that takes place.

Tim Berglund (31:53):
Is that right?

Anna McDonald (31:53):
Yes. Yeah. So, so yeah, like, like when that happened, you could think about it this way too. Right? One of the wonderful things about Kafka is you can take down a node for maintenance and if you are configured with any assemblance of propriety, uh, you will not take down your apps. Um, and so it's, it's kind of a similar example, like you said, and as long as you configure your clients to be resilient in the face of retriable errors of which that is one that we'll try again until they do have a good leader elected. Right. So you're, I think what you're digging into is incredibly important, which is it's the two coins of resilience that you have to have resiliency in your infrastructure and you have to have resiliency in your clients.

Mitch Henderson (32:33):
Got it. And where this really comes into play is that the clients are single cluster only. So if you're not stretched or yeah,

Anna McDonald (32:43):
She had one cluster, sorry. Yes. Hashtag one cluster use it, love it.

Mitch Henderson (32:49):
And that's where this architecture really, really shines because it is just one cluster, all the things that those clients do, all that plumbing you're talking about, we just inherit that. And it's all good.

Tim Berglund (33:02):
Of course. Okay. No, that, that makes sense. Uh, and that's, we wouldn't want clients to have to change cause that's, you know, fairly proven stuff and you know, there's already a, you know, various streams of work in Apache Kafka or those things get improved over time and we kind of want to rely on the existing mechanism there. Alright. So I need to back up a little bit. I keep drilling down into things. I think they're interesting questions and I'm losing track of where I am. We were talking about when you need it, Anna. And you said when pro, which is a point, just give me these again, RPO,

Anna McDonald (33:46):
How much of your live stream can you afford to do,

Tim Berglund (33:48):
Right. And how long can it take you, uh, to, to fail over? And when those are small, obviously you don't get to specify zero, cause that's not realistic. Cause things fail and you know, sockets take time to time out and all that kind of stuff. But when those are small, then you want to MRC and in general, that's, that's the thing that you find drives people to want this feature.

Anna McDonald (34:08):
Yep.

Tim Berglund (34:09):
Okay. So we've kind of been digging into the client side, but I want to go there a little bit more and get a sense of what's difficult without MRC, uh, about managing fail overs. Like are there ever offset problems I have to deal with? It's not, it's not clear immediately to me that there would be, but do I, do I have to keep track of offsets when I manage a fail over and what, what else in my life as a developer using this gets better.

Anna McDonald (34:38):
I can be negative. It's super fun. Watch me. So yeah, if you don't have hashtag one cluster, you have two separate clusters, you have a lot of problems because you have to remember currently if you use, you know, no matter if you use replicators or something like mirror maker, it doesn't matter. You don't have offset parody. You have two separate clusters, offsets don't match. Even if you, you know, use something like replicator has offset translation, you have to deal with the fact that this is async replication, which means there potentially could be a lag, which means that I'm a producer. I have to know that all of my data might not have made it over there before the data center died. So I have to make sure I can reproduce any data that didn't make it over as a consumer I have to know where I left off.

Anna McDonald (35:27):
And not only that, I have to know how to map where I left off to the other cluster because the offsets aren't the same. When it fails, I've got to go bring everything down, you know, update my bootstrap list, move over. There's there's, there's just a lot of requirements in order to reach, um, a low like, and again, right, this is driven by your RTO and RPO. If your recovery point objective is zero means you can't lose any data than the requirements. I just said, you have to do, they have to be able to reproduce any data that didn't make it. Right. And you have to know where you left off to make sure that you're not getting a ton of Dukes because most people do not like it.

Mitch Henderson (36:09):
And one of the things I talked to a lot of folks about is deciding when to fail over, like do, if I have two clusters and they're asynchronously being replicated with Mirrormaker replicator, do I fail over? When, when broker goes down, when one leader goes offline, what's my, what's my criteria for this. It's much easier to hand this off to the existing clients and just use their logic because it's, it's tested and works.

Tim Berglund (36:35):
Sure. It's, it's what they do. Okay. That makes sense. So the client experience is you've got retries enabled, which let's assume that you do in this sort of scenario where you'd want something like MRC. Um, you're fine. Retries do their thing. And you're gonna experience significantly longer latency on that read or write as the fail over is happening. But that's what happens and, and it should be invisible to the developer. There's no fussing with offsets. There's no special thing. You have to do catching an exception or anything like that. Am I oversimplifying?

Anna McDonald (37:11):
I'm just, it gets, well, it gets a little worse. I just want to make sure I, I draw this horrible picture accurately because it gets even worse when you talk about yeah. It's like, it's like, well, not as bad as to you, but it's getting there now. It's with Kafka streams in particular. Think about it. Like your Kafka streams, it's a flow, right. It all works together. And so the internal topics that are holding state, right? Those are replicated asynchronously. So you don't even have any semblance of assured consistent state. So you can't really in a, in an active, passive set up, you can't even replicate internal topics for Kafka streams because there's no assurance you'd get the right state. You could have, um, you know, things that were were further. Yeah. I mean, it's, it's a hot mess. And so the burden on the client for Kafka streams and for upper-level API becomes even worse to the point where it's really like, you need to run a stretch. You really do. Otherwise. It's going to be very unwieldy if you have those resiliency requirements. Yeah. Okay. For more information and gory details. See my Kafka summit talk,

Tim Berglund (38:19):
Thank you for that. Um, and okay. I'm, I'm not quite sure. Uh, I don't think, let me put this positively. I think by the time this airs, there will be a link to that available online. I'm pretty sure this episode is going to go live after Kafka Summit.

Tim Berglund (38:36):
Um, and if I'm wrong, then, Hey, you're lucky. And, uh, pretty soon after you listen to this, you'll be able to listen to Anna's talk, what else? Uh, what else comes up? I feel like we're, I have a pretty good understanding of, of how this works and what the experience is like operationally and as a client developer, what are other kind of, uh, common topics that come up in the MRC world?

Mitch Henderson (39:03):
Yeah. So the other topic that almost always comes up is I've got, you know, maybe I don't want to have a fully synchronous replica someplace and pay that that produced latency cost. So one of the other features of MRC is the concept of an observer note. It's not a leader, it's not a follower, it's an observer note. And what that is is an async replica. And, but it uses the same Kafka protocol that everything replicates to or replicates with, excuse me, the, you know, and we actually have kind of a pattern that, uh, Addison Hudley, the, uh, the PM four core Kafka and confluent here, uh, kind of coined the term for, and it's called Kozo Kafka, observer, zookeeper, observer.

Mitch Henderson (39:57):
And, and what this allows us to do is not only have, you know, two data centers or for example, I have two data centers with synchronous replica replication without synchronous, without the a, um, but we can also have a third data center sitting out there that's asynchronous or multiple of these data centers for say point to presence, data distribution, without paying all the right penalties.

Tim Berglund (40:22):
Of course. Okay. And when you say asynchronous replica, what that means is it's a replica, like any other in terms of the way it's got, you know, it's going out and scraping writes off its leader, but there isn't any, there isn't any producer that will wait for replication to happen to that node.

Mitch Henderson (40:43):
Exactly. Right. It's, it's the same concept of, of how we traditionally do dr. Right. We would produce the message into one cluster and it would eventually get out to the other cluster.

Anna McDonald (40:54):
What does KOZO stand for again?

Mitch Henderson (40:57):
Kafka, Observer, ZooKeeper, Observer.

Anna McDonald (41:00):
That's right. It's also a plant.

Tim Berglund (41:02):
It is. Is it? What, what kind of Anna? What kind of plant is it?

Anna McDonald (41:06):
It's a paper Mulberry. I think the scientific name is, is Brossa Nydia pepper FIA. And that's totally not. Cause I just looked it up. Yeah. Hashtag just Googled it. Nice. That's one point. I'll get it.

Tim Berglund (41:22):
My guests today have been Anna McDonald and Mitch Henderson. Thanks for being a part of Streaming Audio.

Anna McDonald (41:28):
Thank you, Tim. This was great. Yes.

Mitch Henderson (41:31):
Thank you, Tim.

Tim Berglund (41:32):
Hey, you know what you get for to the end, some free Confluent Cloud use the promo code 60PDCAST. That's 60PDCAST to get an additional $60 of free Confluent Cloud usage. Be sure to activate it by December 31st, 2021, and use it within 90 days after activation and any unused promo value on the expiration date will be forfeited. And there are limited number of codes available. So don't miss out! Anyway, as always, I hope this podcast was helpful to you. If you want to discuss it or ask a question, you can always reach out to me at @tlberglund on Twitter. That's @tlberglund. Or, you can leave a comment on a YouTube video or reach out in our community Slack. There's a Slack sign-up link in the show notes if you'd like to join. And while you're at it, please subscribe to our YouTube channel. And to this podcast, wherever fine podcasts are sold. And if you subscribe through Apple podcasts, be sure to leave us a review there. That helps other people discover us, which we think is a good thing. So thanks for your support. And we'll see you next time.