Streaming Audio: Apache Kafka® & Real-Time Data

Apache Kafka 3.4 - New Features & Improvements

Confluent, founded by the original creators of Apache Kafka® Season 1 Episode 256

Apache Kafka® 3.4 is released! In this special episode, Danica Fine (Senior Developer Advocate, Confluent), shares highlights of the Apache Kafka 3.4 release. This release introduces new KIPs in Kafka Core, Kafka Streams, and Kafka Connect.

In Kafka Core:

  • KIP-792 expands the metadata each group member passes to the group leader in its JoinGroup subscription to include the highest stable generation that consumer was a part of. 
  • KIP-830 includes a new configuration setting that allows you to disable the JMX reporter for environments where it’s not being used. 
  • KIP-854 introduces changes to clean up producer IDs more efficiently, to avoid excess memory usage. It introduces a new timeout parameter that affects the expiry of producer IDs and updates the old parameter to only affect the expiry of transaction IDs.
  • KIP-866 (early access) provides a bridge to migrate between existing Zookeeper clusters to new KRaft mode clusters, enabling the migration of existing metadata from Zookeeper to KRaft. 
  • KIP-876 adds a new property that defines the maximum amount of time that the server will wait to generate a snapshot; the default is 1 hour.
  • KIP-881, an extension of KIP-392, makes it so that consumers can now be rack-aware when it comes to partition assignments and consumer rebalancing. 

In Kafka Streams:

  • KIP-770 updates some Kafka Streams configs and metrics related to the record cache size.
  • KIP-837 allows users to multicast result records to every partition of downstream sink topics and adds functionality for users to choose to drop result records without sending.

And finally, for Kafka Connect:

  • KIP-787 allows users to run MirrorMaker2 with custom implementations for the Kafka resource manager and makes it easier to integrate with your ecosystem.

Tune in to learn more about the Apache Kafka 3.4 release!

EPISODE LINKS

Danica Fine (00:00):

Welcome to Streaming Audio. I'm Danica Fine, developer advocate at Confluent. You're listening to a special episode where I have the honor of announcing the Apache Kafka 3.4 release. On behalf of the Kafka community. There are so many great KIPs in this release so, let's get to it. As usual, the release is broken up based on what the KIP pertains to. And this release will cover updates from Kafka Core, Kafka Streams, and Kafka Connect. First up, for Kafka Core we have KIP-866, which provides a bridge to migrate between existing ZooKeeper clusters to new KRaft mode clusters. With this change, you'll be able to migrate your existing metadata from ZooKeeper to KRaft. After metadata is synced between the two clusters using dual-write mode, you can safely transition control to KRaft controllers, and just in case the change also allows fail back to ZooKeeper during the upgrade migration.


Danica Fine (00:46):

This KIP gives you more flexibility to try out KRaft mode. But keep in mind that these changes are just an early access release. General availability of this feature will be a part of Apache Kafka 3.5. Next up is KIP-830. This change treats the JMX reporter like all other reporters so that it can be disabled for environments where it's not being used. The KIP includes a new configuration setting, auto include JMX reporter, to support disabling the reporter. When set false, a JMX reporter won't be instantiated, and instead reporters set via Metric Reporters will be used. When set to true, which is the default, a deprecation warning is printed directing the user to use Metric Reporters instead. In Apache Kafka 4.0, expect the default value for Metric Reporters to be set to JMX Reporter. With KIP-881, consumers can now be rack-aware when it comes to partition assignments and consumer rebalancing.


Danica Fine (01:37):

In the past, KIP-392 enabled consumers to fetch from their closest replica. KIP-881 is an extension which allows consumers to fetch data from leaders or followers within the same availability zone when possible to benefit from locality. These changes are also a stepping stone to the consumer group protocol work being done as part of KIP-848, which will introduce rack-aware partition assignments in both the service side partition assigners, and the client side partition assigners. Check out KIP-881 for more details on how these changes will impact future design of client protocols. Next up is KIP-876. Cluster metadata snapshots are used to compact the underlying log segments and clean up redundant records. In the past, these snapshots were triggered based on the amount of data bytes that had been appended to the log since the last snapshot.


Danica Fine (02:22):

But seeing as snapshots are also used as cluster backups, it makes sense to generate snapshots based on time, KIP-876 enables this. To do so, this KIP adds a new property that defines the maximum amount of time that the server will wait to generate a snapshot, the default is one hour. And finally, we have KIP-854, which introduces changes to clean up producer IDs more efficiently. Item potent producers and transactions are essential to Kafka's exactly-once semantics, and they both require some IDs in order to work properly. Item potent producers are assigned a producer ID automatically at startup. Transactions are meant to offer guarantees across topic partitions and producer sessions, so they require both a producer ID as well as a user provided transaction ID.


Danica Fine (03:02):

In the past, both producer IDs and transaction IDs would expire and be cleaned up using a single timeout parameter. KIP-679 made all producers item potent by default, so there are quite a lot more producer IDs floating around. To avoid excess memory usage, there needs to be a way to expire and clean up producer IDs independent of transaction IDs. KIP-854 introduces a new timeout parameter that affects the expiry of producer IDs and updates the old parameter to only affect the expiry of transaction IDs. Also affecting Kafka Streams is KIP-837, which allows users to multicast result records to every partition of downstream sync topics. The change also adds functionality for users to choose to drop result records without sending. To achieve this, a few changes were made. First, the Streams Partitioner Interface was given a new method called Partitions, which returns an optional set of partitions.


Danica Fine (03:53):

Next, the record collector and its implementing class now accepts a stream partitioner object. The record collector uses the partition's method from the streams partitioner to determine which partitions to send the record to. And don't worry, the key query metadata class was also updated to account for the fact that a single key could be present in multiple partitions. In rounding out our updates for Kafka 3.4, we have one KIP for Kafka Connect, specifically for MirrorMaker 2. In the past, MirrorMaker 2 used the built-in Kafka admin client and made a number of assumptions about the ACLs and administrative control that a particular user must have in order to run. Although these assumptions simplified MirrorMaker 2 resource management, they particularly affected those trying to run federated or infrastructure as code solutions. KIP-787 bypasses these hurdles by allowing users to run with custom implementations for the Kafka Resource Manager and integrate more easily with their ecosystems.


Danica Fine (04:46):

Those are the highlights from this latest Apache Kafka release. Thank you for taking the time to listen to this special episode. If you have any questions or would like to discuss, you can reach out to our community forum or Slack. Both are linked in the show notes. If you're listening on Apple Podcast or other podcast platforms, please be sure to leave a review. We'd love to hear your feedback. If you're watching on YouTube, please subscribe so you'll be notified with updates that you might be interested in. Thanks again for your support and see you next time.