Streaming Audio: Apache Kafka® & Real-Time Data

Apache Kafka 3.3 - KRaft, Kafka Core, Streams, & Connect Updates

October 03, 2022 Confluent, founded by the original creators of Apache Kafka® Season 1 Episode 237
Streaming Audio: Apache Kafka® & Real-Time Data
Apache Kafka 3.3 - KRaft, Kafka Core, Streams, & Connect Updates
Show Notes Transcript Chapter Markers

Apache Kafka® 3.3 is released! With over two years of development, KIP-833 marks KRaft as production ready for new AK 3.3 clusters only. On behalf of the Kafka community, Danica Fine (Senior Developer Advocate, Confluent) shares highlights of this release, with KIPs from Kafka Core, Kafka Streams, and Kafka Connect. 

To reduce request overhead and simplify client-side code, KIP-709 extends the OffsetFetch API requests to accept multiple consumer group IDs. This update has three changes, including extending the wire protocol, response handling changes, and enhancing the AdminClient to use the new protocol. 

Log recovery is an important process that is triggered whenever a broker starts up after an unclean shutdown. And since there is no way to know the log recovery progress other than checking if the broker log is busy, KIP-831 adds metrics for the log recovery progress with `RemainingLogsToRecover` and `RemainingSegmentsToRecover`for each recovery thread. These metrics allow the admin to monitor the progress of the log recovery.

Additionally, updates on Kafka Core also include KIP-841: Fenced replicas should not be allowed to join the ISR in KRaft. KIP-835: Monitor KRaft Controller Quorum Health. KIP-859: Add metadata log processing error-related metrics. 

KIP-834 for Kafka Streams added the ability to pause and resume topologies. This feature lets you reduce rescue usage when processing is not required or modifying the logic of Kafka Streams applications, or when responding to operational issues. While KIP-820 extends the KStream process with a new processor API. 

Previously, KIP-98 added support for exactly-once delivery guarantees with Kafka and its Java clients. In the AK 3.3 release, KIP-618 offers the Exactly-Once Semantics support to Confluent’s source connectors. To accomplish this, a number of new connectors and worker-based configurations have been introduced, including `exactly.once.source.support`, `transaction.boundary`, and more. 

Image attribution: Apache ZooKeeper™: https://zookeeper.apache.org/ and Raft logo:  https://raft.github.io/  

EPISODE LINKS

Danica Fine: (00:00)
Welcome to Streaming Audio. I'm Danica Fine, Senior Developer Advocate at Confluent. You're listening to a special episode where I have the honor of announcing the Apache Kafka 3.3 release on behalf of the Kafka community. There are so many great KIPs and updates to highlight in this release, so let's get to it.

Danica Fine: (00:25)
Hi, I'm Danica Fine with Confluent, here to tell you what's new in Apache Kafka 3.3. We have a number of great KIPs in this release, so without further ado, let's jump right in.

Danica Fine: (00:35)
As usual, the release is broken up based on what the KIP pertains to. In this release, we'll cover updates from Kafka Core, Kafka Streams, and Kafka Connect. First up for Kafka Core, we have KIP-709, which affects OffsetFetch requests. Currently, applications have to submit a separate OffsetFetch request to the group coordinator for each group ID – as you can imagine, this can be tedious. With KIP-709, the process has been streamlined so that a single request can be made to fetch offsets for multiple groups. Overall, this reduces request overhead and simplifies client-side code.

Danica Fine: (01:15)
KIP 824 affects the kafka-dump-log.sh tool which allows operators to sample logs from topics. No, unfortunately, this tool used to inefficiently dump the entire log. But, with KIP-824, the new max-bytes parameter allows operators to sample only a small slice of the log in a more efficient and scalable way.

Danica Fine: (01:37)
Also impacting operators is KIP-827, which exposes both the log directory total bites and usable bites metrics via the Kafka API. These can be used to programmatically check the state of various disc related operations.

Danica Fine: (01:52)
Log recovery is an important process that is triggered whenever a broker starts up after an unclean shutdown. To better monitor this process, KIP-831 introduces two new metrics: RemainingLogsToRecover and RemainingSegmentsToRecover. Both metrics are offered on a per-thread basis.

Danica Fine: (02:12)
And finally, we have KIP-851. With the OffsetFetchRequest, there’s a boolean requireStable flag to indicate whether to tolerate pending transactional offset commits in the group coordinator. The admin client uses OffsetFetchRequest in its listConsumerGroupOffsets method where the requireStable flag is always set to false which isn’t very useful for getting committed offsets with exactly once semantics… With this KIP, the adminclient now has the ability to choose to either include or exclude pending transactional offsets.

Danica Fine: (02:47)
Next up we have a couple of Kafka Streams KIPs. KIP-820 extends Kafka Streams to use the latest processor API. With this update, it's now simple to chain results from processors and have more control over what's forwarded.

Danica Fine: (03:03)
The second and final update for Kafka Streams is KIP-834. This change adds pause and resume to KSteams topologies, allowing users to pause the processing, punctuation, and standby tasks of the topology. Overall, this makes it easier to do a number of things, like reduce resource usage when processing is not required, modify logic of Kafka Streams applications, and respond to operational issues.

Danica Fine: (03:29)
And finally, we have an exciting Kafka Connect KIP that should be music to everyone's ears. KIP-618 introduces exactly-once support for source connectors. To achieve this, a number of new connector and worker based configurations have been introduced. Now keep in mind that not all source connectors will support this, so I recommend that you read the details on the KIP itself to understand which of your connectors can take advantage of this new functionality.

Danica Fine: (03:57)
All right, that's all that I have to share for now. Until the next release.

Danica Fine: (04:04)
What are you doing here?

Danica Fine: (04:15)
It's happening. That's right, folks. With KIP-833, KRaft has finally been marked production ready for new clusters only. And just so you know, there are a few limitations. You won't want to use KRaft in production for new clusters if you're using SCRAM, JBOD, certain dynamic configurations or delegate tokens. Check out the KIP for more details.

Danica Fine: (04:39)
A ton of work was involved to make this happen and quite a lot was done to improve KRaft since Apache Kafka 3.2, such as ensuring fenced replicas don't join the ISR when in KRaft mode, improved controller health checks and new error metrics, but it all adds up to production-ready KRaft for new clusters.

Danica Fine: (04:57)
Now when can existing Zookeeper mode clusters migrate to KRaft? Great question, I'd love to tell you. While the timeline is subject to change, currently you can expect to be able to upgrade existing clusters from Zookeeper mode to KRaft mode with Apache Kafka 3.5. The community is targeting early access with limited unstable APIs as part of release version 3.4, and the current plan is to have version 3.5 be a bridge release. At the same time, you can expect Zookeeper to be deprecated and then removed as part of Apache Kafka 4.0. And that's all I have to share for now.

Danica Fine: (05:34)
There are a ton of KIPs involved in this release though, so as usual, I encourage you to head on over to our Confluent blog or take a look at the release notes to check them out in more detail. And of course, I look forward to hearing about what you build.

Danica Fine: (05:47)
Those are the highlights from this latest Apache Kafka release. Thank you for listening to this very special episode. If you have any questions or would like to discuss, you can reach out to our community forum or Slack. Both are linked in the show notes. If you happen to be listening on Apple Podcast or other podcast platforms, please be sure to leave a review. We'd love to hear your feedback and if you're watching on YouTube, please subscribe so you'll be notified with updates you might be interested in. Thanks for your support, and see you next time.

Intro
KIP-709: Extend OffsetFetch requests to accept multiple group ids.
KIP-824: Allowing dumping segmentlogs limiting the batches in the output
KIP-827: Expose logdirs total and usable space via Kafka API
KIP-831: Add metric for log recovery progress
KIP-851: Add requireStable flag into ListConsumerGroupOffsetsOptions
KIP-820: Extend KStream process with new Processor API
KIP-834: Pause / Resume KafkaStreams Topologies
KIP-618: Exactly-Once Support for Source Connectors
KIP-833: Mark KRaft as Production Ready
It's a wrap