Streaming Audio: Apache Kafka® & Real-Time Data

Apache Kafka 2.7 - Overview of Latest Features, Updates, and KIPs

Confluent, original creators of Apache Kafka® Season 1 Episode 134

Apache Kafka® 2.7 is here! Here are the key Kafka Improvement Proposals (KIPs) and updates in this release, presented by Tim Berglund. 

KIP-497 adds a new inter-broker API to alter in-sync replicas (ISRs). Every partition leader maintains the ISR list or the list of ISRs. KIP-497 is also related to the removal of ZooKeeper.

KIP-599 has to do with throttling the rate of creating topics, deleting topics, and creating partitions. This KIP will add a new feature called the controller mutation rate.

KIP-612 adds the ability to limit the connection creation rate on brokers, while KIP-651 supports the PEM format for SSL certificates and private keys.

The release of Kafka 2.7 furthermore includes end-to-end latency metrics and sliding windows.

Find out what’s new with the Kafka broker, producer, and consumer, and what’s new with Kafka Streams in today’s episode of Streaming Audio!

EPISODE LINKS

Tim Berglund:
Welcome to Streaming Audio, a podcast about Kafka, Confluent, and the cloud. We've got some great stuff for you today. Hope you enjoy.

Tim Berglund:
Hi, I'm Tim Berglund with Confluent. I'm standing here in front of this stream on a rather chilly day in the suburbs of Denver, Colorado, to tell you all about Apache Kafka 2.7. Now as always, the things to talk about in a new release are the KIPs or the Kafka improvement proposals. A Kafka improvement proposal starts as a Wiki page that's kind of outlining the motivation for the change and the proposed approach to it. And then a bunch of work follows, that KIP gets merged into a release, and off we go. So today in 2.7, I've got two categories of KIPs to talk to you about Kafka core and Kafka streams. Let's get started on the Kafka core KIPs. KIP 497 is entitled, add inter broker API to alter ISR. Now let me explain some background. Every partition leader maintains the ISR list or the list of in-sync replicas.

Tim Berglund:
So that partition's got replica's probably placed on other brokers. Some of those are at any given time going to be in sync and some of them are not in sync and you can look up the definition of that. But just suffice to say, the ISR list is the list of replicas that are up-to-date and trustworthy on that account. Now, the controller can change the ISR list, which it has to do when it's managing fail-over. And of course, a partition leader can modify the ISR list if it determines that one of its followers has fallen out of sync. Traditionally, both of those actors they're the controller and the partition leader would both talk to Zookeeper directly. There's a Znode and potentially another Znode that they will tickle directly inside of Zookeeper to make that happen. And it can take a while then for the controller to have a consistent view of that state when the partition leader changes it.

Tim Berglund:
So you have these two actors changing a thing, and now you have to get consensus about what the current state of that thing is. Kind of a classical distributed systems problem, which we'd rather not have. So now with this KIP, what happens is there's a new admin API that partition leaders use to talk directly to the controller. And so they don't talk to Zookeeper directly anymore. Two good things about this. Number one, is you have basically instant consistency about the new ISR list, rather than taking up to a minute like it used to, and brokers aren't talking to Zookeeper. So this is one of the many things as we talk about KIP 500, as I've been talking about KIP 500 for the last year. Every time there's a KIP that's related to 500, it usually is an apron string being cut. Some little connection from something in the system to Zookeeper that's not a connection anymore, and that's the case here.

Tim Berglund:
So once all of these are done, then Zookeeper we'll just have removed all that connective tissue. It'll just kind of come out and we'll have a Zookeeper list Kafka. So this is related to that, but also makes its own improvements in ISR list modifications and the latency associated with them. KIP 599 has to do with throttling the rate of creating topics, deleting topics, and creating partitions. So when we're creating a new topic, which really means we're creating some number of new partitions, if that's happening at a high rate, that can overwhelm the controller if you have a sudden burst of topic creation, which could happen for any number of reasons. And so this really just has to do with throttling the rate at which those can occur at the controller. We have a new quota and a new associated quota manager that new quota is called controller mutation rate.

Tim Berglund:
So that's the number of partition mutations during a configurable period. What's that period? Well, that's a broker config controller quota window size in seconds. And so we have a certain number of those things that can happen within that window. We also have a new error if you exceed that quota, that throttling quota exceeded. And of course, the attendant metrics for us to be able to take a look at this stuff. This is just a way of loving the controller a little bit in situations where there might be client applications that are doing things that would overwhelm it. We have a nice responsible way of throttling it so that it doesn't fall over. KIP 612 gives us the ability to limit connection create rate on brokers. Now it has long been possible to throttle the rate of connection attempts from unauthenticated clients, because that's just an obvious source of malfeasance, right?

Tim Berglund:
Somebody can DOS you if they can just if they have to access to a port 9092 and some bootstrap servers, then they can just kind of hammer away and kill things. So that's always been possible or long been possible to throttle that. But what about authenticated clients? Now we trust them a little more, but like Baron Harkonnen broke the imperial conditioning of Dr. Hughie, sometimes they go South or sometimes they can get confused and just accidentally do the wrong thing. So this KIP has to do with throttling those authenticated clients and potential connection storms from them. So we've got a few new config parameters. That's the max connection creation rate. That's the rate per second and the maximum connection creation attempts per second that we'll tolerate. And in the Kafka configs command line tool we've got some new switches. You see their entity type and entity name.

Tim Berglund:
If we want to specify that rate with respect to a particular source IP, we can now do that with the Kafka configs command line tool. KIP 651 gives us the ability to use the PEM format for certificates and for private keys. Now, historically Kafka has given us PKCS12 and JKS. That's the Java key store file formats as ways of storing certificates and key pairs. And I know you don't want to rush into these things, but we've added PEM support. And the super cool thing about this now is that just the PEM file basically can be the value of a config parameter. And so with dynamic broker reconfiguration, now we have the ability to change keys and change certs dynamically through the config file and through using the PEM file format, which mileage may vary, and everyone is entitled to her or his own opinion.

Tim Berglund:
But personally, I would much rather deal with PEM files than JKS files or PKCS12 files and all the command line tools that you have to use to manipulate them. So this is a huge improvement in usability. And as you can see, we've got the SSL key store certificate chain and SSL key store key config parameters now that work along with this. KIP 545 support automated consumer offset sync across clusters in MirrorMaker 2.0. Now MirrorMaker 2.0 was merged in KIP 382 in those Apache Kafka 2.4. And what it does is it keeps topics in sync between Kafka clusters, it's the multi data center multi-cloud or hybrid cloud solution in AK. And consumer offsets between those topics don't automatically stay in sync between the two clusters. And that makes sense if you think about it. MirrorMaker acts as a Kafka connect connector, and there's no reason those offsets would stay in sync by themselves.

Tim Berglund:
And now there is an automated way to get that done prior to 2.7 as an operator, you had to run external tooling to translate those offsets, but now with a configuration setting, MirrorMaker 2.0 can automatically translate and sync those offsets, so consumers can kind of bounce back and forth or fail over between clusters a little bit more easily. Not enabled by default, but that config that you see there and it checkpoints enabled and a primary backups in group offsets enabled, set those to true and away you go. So we've got a couple of Kafka streams KIPs. Let's take a look at those. We've got KIP 450. Now this is a KIP led by a friend of mine, Leah Thomas, just over the last few months that has to do with sliding windows. Now, historically in Kafka streams, you've got tumbling hopping and session windows.

Tim Berglund:
And I will not give you a detailed description of all of those right now, but let's just talk about hopping real quick. So hopping window is where the window has a certain width. And then after some discreet period of time, it hops. Say it's a ten second window, and every second it hops one second forward. And that can be very useful for certain kinds of aggregations. But some people were finding that they really wanted that hopping time to approach zero or the smallest possible quantum of time that streams would deal with. So you'd find people making one second or one millisecond hopping windows, which could work sometimes. But all of those windows that got created in that very, very small hop, those used memory and resources, and had to be kept around for later arriving data and all the sorts of things that attend extant windows.

Tim Berglund:
So that was bad. Now we have sliding windows, which allows us to define a window width and what amounts to a continuous sliding over time, such that each event that shows up in a window only shows up in one of them as it slides. And I say, continuous, it looks continuous, right? But of course there are discrete time steps, just the smallest possible time step that it takes. So a brief summary there of sliding windows, huge improvement for the aggregations where people were doing the hopping window hack, and hopefully this will change your life if you were one of those people. KIP 613 gives us some new metrics for computing end to end latency in a streams application. This is often what you really want to know, right? There are lots of latencies that you can study and when you're debugging and optimizing things, then certainly, there's more than one latency that you're going to have to look at to understand where your hotspots are and what's really going on and what can be improved.

Tim Berglund:
But when you just want to look at latency, what's the thing you want? You want end to end. Where does the event start and where does it stop in terms of my stream processing topology? And this KIP gives us three new metrics associated with that min, max, and average. So this should be something that makes it hugely easier for you to look at end to end latency in a stream processing applications. Now just an automatic thing that's exposed for you, and you don't have to go to any weird lengths of trying to compute that or back it out of any other measurements by yourself.

Tim Berglund:
So that's it. That's Apache Kafka, 2.7. A fairly simple release. I do strongly encourage you to download it and try it out. Remember, there's always the release blog post that goes into more detail. It's very simple to Google any one of these KIPs. You get the Wiki page, they are super friendly. They are worth digging into in detail, but what is most of all worth it is that you download 2.7 and try it out and get building something. And as always let us know what that is. We always want to hear what you're working on. Thanks.

Tim Berglund:
And there you have it. I hope this podcast was helpful to you. If you want to discuss it or ask a question, you can always reach out to me at @tlberglund on Twitter. That's @T-L-G-B-E-R-G-L-U-N-D. Or you can leave a comment on a YouTube video or reach out in community Slack. There's a Slack sign-up link in the show notes. If you want to register there. And while you're at it, please subscribe to our YouTube channel. And to this podcast, wherever fine podcasts are sold. And if you subscribe through iTunes, be sure to leave us a review there. That helps other people discover the podcast, which we think is a good thing. So thanks for your support. And we'll see you next time.