Streaming Audio: Apache Kafka® & Real-Time Data

Confluent Platform 7.1: New Features + Updates

April 12, 2022 Confluent, original creators of Apache Kafka® Season 1 Episode 208
Streaming Audio: Apache Kafka® & Real-Time Data
Confluent Platform 7.1: New Features + Updates
Show Notes Transcript Chapter Markers

Confluent Platform 7.1 expands upon its already innovative features, adding improvements in key areas that benefit data consistency, allow for increased speed and scale, and enhance resilience and reliability.

Previously, the Confluent Platform 7.0 release introduced Cluster Linking, which enables you to bridge on-premises and cloud clusters, among other configurations. Maintaining data quality standards across multiple environments can be challenging though. To assist with this problem, CP 7.1 adds Schema Linking, which lets you share consistent schemas across your clusters—synced in real time.

Confluent for Kubernetes lets you build your own private-cloud Apache Kafka® service. Now you can enhance the global resilience of your architecture by employing to multiple regions. With the new release you can also configure custom volumes attached to Confluent deployments and you can declaratively define and manage the new Schema Links. As of this release, Confluent for Kubernetes now supports the full feature set of the Confluent Platform. 

Tiered Storage was released in Confluent Platform 6.0, and it offers immense benefits for a cluster by allowing the offloading of older topic data out of the broker and into slower, long-term object storage. The reduced amount of local data makes maintenance, scaling out, recovery from failure, and adding brokers all much quicker. CP 7.1 adds compatibility for object storage using Nutanix, NetApp, MinIO, and Dell, integrations that have been put through rigorous performance and quality testing.

Health+ was introduced in CP 6.2—offers intelligent cloud-based alerting and monitoring tools in a dashboard. New as of CP 7.1, you can choose to be alerted when anomalies in broker latency are detected, when there is an issue with your connectors linking Kafka and external systems, as well as when a ksqlDB query will interfere with a continuous, real-time processing stream. 

Shipping with CP 7.1 is ksqlDB 0.23, which adds support for pull queries against streams as opposed to only against tables—a milestone development that greatly helps when debugging since a subset of messages within a topic can now be inspected. ksqlDB 0.23 also supports custom schema selection, which lets you choose a specific schema ID when you create a new stream or table, rather than use the latest registered schema. A number of additional smaller enhancements are also included in the release.

EPISODE LINKS

Danica Fine:
Hi there, and welcome to another episode of Streaming Audio! I’m Danica Fine, senior developer advocate with Confluent, here to tell you all about the Confluent Platform 7.1 release. Before we get to the details of what’s new in 7.1, Streaming Audio is brought to you by Confluent Developer, that’s developer.confluent.io—a website with everything you need to get started learning Apache Kafka®. There are executable tutorials, a library of event-driven design patterns, video courses, and everything you’ll need to build your own event streaming application. Check out the details in this special episode of Streaming Audio. With that, let's dive into the updates for this latest release.

Danica Fine:
Streaming data has become critical to the success of modern businesses. Leveraging real-time data is what enables companies like Instacart to set themselves apart, delivering the rich digital experiences and data-driven backend operations that delight customers. For the many businesses that operate across a multitude of on-prem data centers and cloud providers, unlocking these digital initiatives requires sharing high-quality, consistent data streams between environments.

Danica Fine:
In Confluent Platform 7.0, we introduce Cluster Linking to provide the best way to bridge on-prem and cloud environments, allowing data to flow in real-time in a simple, secure, reliable, and cost-effective manner. This democratized access to real-time data throughout the organization, giving teams self-service access to the data wherever it resides through global data sharing. Operating with connected clusters across environments introduces an increased need for globally enforced standards around data quality. Each new linked cluster brings with it the challenge of maintaining data compatibility and regulatory compliance with information security policies.

Danica Fine:
The problem is that creating a consistent data layer that satisfies the needs of a modern business is challenging due to data sprawl resulting in silos, not having common standards enforced across different environments, the inability to scale to demand, and prohibitive data storage expenses.

Danica Fine:
To truly unlock data for next-generation digital initiatives, development teams and data experts need access to a scalable global resilient data mesh that can easily and quickly deliver trusted compliant data across all environments.

Danica Fine:
With the release of Confluent Platform 7.1, we're building on top of the innovative feature set announced in recent releases, providing several enhancements that allow companies to harness trusted quality data across all environments with speed, scale, and stability. Confluent Platform 7.1 delivers three primary benefits to enable this vision.

Danica Fine:
First, it ensures globally consistent data across hybrid environments with shared schema that sink in real-time. It also improves scale and speed while maintaining operational simplicity with increased DevOps automation and expanded tiered storage options. And finally, it brings enhanced reliability and global resilience with additional intelligent alerts and multi-region capabilities for containerized environments.

Danica Fine:
Additionally, the enhancements delivered as part of Confluent Platform 7.1 reinforce our differentiation across the Everywhere, Cloud-Native, and Complete pillars. Let's go into the details of each of these new features.

Danica Fine:
Supporting our Everywhere pillar, we're announcing the general availability of Schema Linking, which provides an operationally simple means of ensuring globally consistent data across hybrid environments with shared schema that sync in real-time.

Danica Fine:
As I said, Confluent Platform 7.0 brought Cluster Linking, enabling you to easily link clusters together to form a highly available, consistent, and real-time bridge between on-prem and cloud environments. But operating connected clusters across environments introduces an increased need for globally enforced standards to maximize data quality. This is where schema linking comes into play.

Danica Fine:
Schema Linking allows you to maintain trusted compatible data streams across hybrid and multi-cloud environments by sharing and syncing consistent schema between independent clusters in real-time. Schema Linking provides two new concepts, which create a simple interface to interact with schema and keep them in sync.

Danica Fine:
First, Schema Contexts are an independent grouping schema IDs and subject names, allowing the same schema ID in different context to represent completely different schemas. This helps to facilitate the transfer of schemas from source to destination. Second, Schema Exporters act as mini-connectors for schema between different environments, making it easy to move schema from one cluster to another.

Danica Fine:
For cloud-native, we're introducing Confluent for Kubernetes version 2.3, which increases global resiliency with multi-region cluster support and enhances the API with declarative management for user-provided connectors and schema links.

Danica Fine:
With the introduction of Confluent for Kubernetes last year, users were able to build their own private cloud Kafka service using a complete declarative API to deploy and operate Confluent. In this latest release, we're introducing support from multi-region clusters, enabling customers to build a globally resilient architecture while also realizing the cloud-native benefits that come from deploying on Kubernetes.

Danica Fine:
With this addition, Confluent for Kubernetes now supports the full feature set of Confluent Platform so customers no longer have to choose between the intelligent API-driven operations of Confluent for Kubernetes and any other Confluent Platform feature that's critical to their use case.

Danica Fine:
Additionally, we're enhancing the API in two ways. First, we're now providing support to declaratively configure and manage user-provided connectors that can be installed and deployed automatically. Second, complimenting the release of Schema Linking, we're also adding schema links to the list of components that could be declaratively defined and managed to simplify operations and accelerate time to value with self-managed Confluent clusters.

Danica Fine:
Also in the Cloud-Native pillar, we've expanded options for Tiered Storage to bring cloud-native to scale and maximize data retention on-prem. Tiered Storage was introduced in Confluent Platform 6.0 enabling Kafka to recognize two layers of storage, local storage on the broker, as well as more cost-efficient object storage to which the broker can offload older topic data.

Danica Fine:
In Confluent Platform 7.1, we're introducing expanded options for long-term object storage on Nutanix, NetApp, MinIO, and Dell. This allows more of our customers to take advantage of the elasticity, operational, and performance benefits of Tiered Storage, bringing Confluent Cloud's intelligent two-tiered storage engine on-prem.

Danica Fine:
We've also put these systems through rigorous performance and quality testing while building strong relationships with the vendor to allow for better customer support.

Danica Fine:
Finally, for the complete pillar, we're introducing new Health+ alerts around broker latency, connectors, and ksqlDB to reduce the risk of downtime. These additional alerts will help to identify and avoid issues before they result in costly downtime by using Confluent's library of expert tested rules and algorithms.

Danica Fine:
Broker latency alerts can detect anomalies in broker latency to make sure you're well within your expected operating SLAs. Connector alerts can ensure reliable integration between Kafka and common external systems by proactively monitoring the state of your connectors. In addition to Health+ alerts, we're also including several enhancements for ksqlDB, such as pull queries on streams and custom schema selection.

Danica Fine:
In previous releases of ksqlDB, pull query functionality was only supported on tables and not streams. Because of this limitation, searching for a subset of individual messages within a Kafka topic for debugging purposes required using a low-level consumer. With the added support for pull queries on streams, developers no longer have to rely on low-level consumers and can instead focus on more value-added tasks. ksqlDB also now supports custom schema selection, bringing more options and control when using schema registry. By default, ksqlDB automatically generates schema for streams and tables based on the latest registered schema for the input topic. With this update, you can have ksqlDB use a specific schema ID when you create your new stream or table rather than the latest registered schema. This allows you to specify a particular output schema for persistent queries when creating materialized views.

Danica Fine:
Also keep in mind that we're adding a number of new features which are a part of ksqlDB 0.23, supporting the ability to read metadata stored in Kafka record headers, enabling access to record partition and offset data, optimizing concurrently executed push queries, and more.

Danica Fine:
Following the standard for every Confluent release, Confluent Platform 7.1 is built on the most recent version of Kafka. In this case, Apache Kafka version 3.1. That Kafka release shipped with a ton of great Kips, extending SASL/OAUTHBEARER to support OIDC, adding new, more consistent latency metrics, and enabling custom partitioners for foreign-key joins in Kafka streams. Check out our Apache Kafka 3.1 release video to learn more about these new features. 

Danica Fine:
So that's Confluent Platform 7.1 in a nutshell. Download the latest version and let us know what you're building. You can use the prom code PODCAST100, to get an additional $100 of free Confluent Cloud usage, for listening to this episode. If you have any questions or would like to discuss, you can reach out to our Community Forum or Slack, both are linked in the show notes. If you are listening on Apple Podcast or other podcast platforms, leave a review and subscribe to get updates you may be interested in. Thanks and see you next time!

Intro
Real-time data sharing at Instacart
Cluster Linking
Confluent Platform 7.1 release
Schema Linking
Confluent for Kubernetes 2.3
Expanded options for Tiered Storage
New Health+ alerts
ksqlDB enhancements
Confluent Platform 7.1 launches with the latest Apache Kafka 3.1
It's a wrap