Rethinking Apache Kafka Security and Account Management Artwork

Confluent Developer ft. Tim Berglund, Adi Polak & Viktor Gamov

Hi, we’re Tim Berglund, Adi Polak, and Viktor Gamov and we’re excited to bring you the Confluent Developer podcast (formerly “Streaming Audio.”) Our hand-crafted weekly episodes feature in-depth interviews with our community of software developers (actual human beings - not AI) talking about some of the most interesting challenges they’ve faced in their careers. We aim to explore the conditions that gave rise to each person’s technical hurdles, as well as how their experiences transformed their understanding and approach to building systems.

Whether you’re a seasoned open source data streaming engineer, or just someone who’s interested in learning more about Apache Kafka®, Apache Flink® and real-time data, we hope you’ll appreciate the stories, the discussion, and our effort to bring you a high-quality show worth your time.

All Episodes

Confluent Developer ft. Tim Berglund, Adi Polak & Viktor Gamov

Rethinking Apache Kafka Security and Account Management

December 08, 2022 • Confluent, founded by the original creators of Apache Kafka® • Season 1 • Episode 246

0:00 | 41:23

Is there a better way to manage access to resources without compromising security? New employees need access to a variety of resources within a company's tech stack. But manually granting access can be error-prone. And when employees leave, their access must be revoked, thus potentially introducing security risks if an admin misses one. In this podcast, Kris Jenkins talks to Anuj Sawani (Security Product Manager, Confluent) about the centralized identity management system he helped build to integrate with Apache Kafka® to prevent common identity management headaches and security risks.

With 12+ years of experience building cybersecurity products for enterprise companies, Anuj Sawani explains how he helped build out KIP-768 (Secured OAuth support in Kafka) that supports a unified identity mechanism that spans across cloud and on-premises (hybrid scenarios).

Confluent Cloud customers wanted a single identity to access all their services. The manual process required managing different sets of identity stores across the ecosystem. Anuj goes on to explain how Identity and Access Management (IAM) using cloud-native authentication protocols, such as OAuth or OpenID Connect, solves this problem by centralizing identity and minimizing security risks.

Anuj emphasizes that sticking with industry standards is key because it makes integrating with other systems easy. With OAuth now supported in Kafka, this means performing client upgrades, configuring identity providers, etc. to ensure the applications can leverage new capabilities. Some examples of how to do this are to use centralized identities for client/broker connections.

As Anuj continues to build and enhance features, he hopes to recommend this unified solution to other technology vendors because it makes integration much easier. The goal is to create a web of connectors that support the same standards. The future is bright, as other organizations are researching supporting OAuth and similar industry standards. Anuj is looking forward to the evolution and applying it to other use cases and scenarios.

EPISODE LINKS

SEASON 2
Hosted by Tim Berglund, Adi Polak and Viktor Gamov
Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed
Music by Coastal Kites
Artwork by Phil Vo

🎧 Subscribe to Confluent Developer wherever you listen to podcasts.
▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes.
👍 If you enjoyed this, please leave us a rating.
🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

Kris Jenkins (00:00):
Hello, you are listening to Streaming Audio and has this ever happened to you? You join a new company and you get your email account set up and your Slack account set up and other things. And sometime in the first few days you find out that there's another system that you should have access to but it hasn't been set up yet. And so you send an email to the admins and you do the permissions dance and you get access and you probably end up doing that permissions dance three or four more times during the course of your onboarding. Certainly happen to me more than once. Well, you can solve that kind of problem in two ways. You can either set up a Word document that rigorously outlines all the different accounts that new joiners need to have and every time it fails, the new joiner adds to that document and you try and keep it up to date or you can do it the proper way and automate the problem away.

Kris Jenkins (00:58):
Joining me today is our ambassador of doing it the proper way, Anuj Sawani, and he's going to take us through the ins and outs of centralized identity management. That one credential server to rule them all what it needs to be able to do, how it works, how it integrates with other services in general, and Kafka in particular. Our resident KIP Watchers will know that this is KIP-768, but for the rest of us, how do you do cloud security and identity management with the fewest possible headaches? Streaming Audio is brought to you by Confluent Developer and I'll tell you more about that at the end. It's also brought to you by that like button and rating button and share buttons. So if you learn something during this, that's the time to click them. But for now, let's get started. I'm Kris Jenkins, this is Streaming Audio. Let's get into it. Joining me today on Streaming Audio is Anuj Sawani. Anuj, how you doing?

Anuj Sawani (02:03):
I'm doing great. How are you Kris?

Kris Jenkins (02:05):
Very well. Looking forward to you teaching me some things about security today.

Anuj Sawani (02:10):
Yeah, happy to. It's going to be a fun talk today.

Kris Jenkins (02:13):
So you are the security product manager lead at Confluent, right?

Anuj Sawani (02:18):
Correct. Yeah. I manage all the security features that our customers can interact with on all of of our products. So yeah, it's definitely a fun space.

Kris Jenkins (02:27):
Okay, well we'll get into that, but before we get there, and this isn't a pun, give me your credentials. How did you get into the world of security?

Anuj Sawani (02:37):
Various things. It dates back all the way to my days at school at Penn State and this was us working on a project which is really interesting, where we managed to figure out a way to actually bring down one of the largest set of providers out there where you just craft out your messages in a certain way. And there was actually a single point of failure on the GSM protocol, which allowed us to have a few cell phones with Linux on them and then send out a bunch of messages and to basically crowd a certain service within the GSM architecture and actually just bring that service down.

Kris Jenkins (03:20):
So you could bring down a whole cell phone network?

Anuj Sawani (03:24):
At that time. I don't know if they fixed it yet, but it was really interesting.

Kris Jenkins (03:29):
How far back is this?

Anuj Sawani (03:30):
This is, I would say at least 14 or 15 years ago. Which was, I mean at that point GSM was probably the most popular choice across all cellular providers. And it's crazy how a simple school project could potentially bring them on and we couldn't do it because it's obviously illegal to attack a cellular network, but we demonstrated how you could do it and it was accepted as a bug.

Kris Jenkins (03:58):
Geez. Well if they haven't fixed it 14 years later, that's on them really.

Anuj Sawani (04:02):
Yeah, absolutely. I'm hoping with the whole 5G architecture and everything, things have gotten a little better, but fingers crossed.

Kris Jenkins (04:11):
Let's hope we learn things.

Anuj Sawani (04:13):
Yeah, yeah. Go ahead.

Kris Jenkins (04:16):
From there, you've spent 12 years in the security side of technology. Take me through what you've done.

Anuj Sawani (04:23):
Yeah, I've been building multiple products in various areas all the way, starting from network security where protecting that underlying infrastructure layer was so important. And then as you got into the cloud world and AWS became this big thing, the network layer started to get abstracted away further down, at least on the infrastructure side. I started to shift focus on the higher layers and I started building out security for SaaS applications, for infrastructure deployments on AWS, things like that. It's been fun and it's kind of interesting how it's evolved through me building security products where I was selling cybersecurity products to companies like Confluent. And now I'm on the absolute other end of the spectrum where I'm working for Confluent, where I'm actually sometimes procuring these cybersecurity products as well as building some of our technology around security.

Kris Jenkins (05:20):
That must be weird switching to the other side of the table.

Anuj Sawani (05:23):
It is weird. And now I'm talking to the same sales guys that I used to work with very closely. Now I'm trying to buy stuff from them, which is really interesting as well.

Kris Jenkins (05:33):
Do you find you grill them harder when you are the customer?

Anuj Sawani (05:38):
Absolutely. I know exactly what it costs. I know exactly how much discounting I should expect. It's a lot easier to deal with and it is fun. It's basically the cybersecurity world is a very small world. Everyone knows almost everyone around in that network. So yeah, it's great to have that network and take advantage of it.

Kris Jenkins (06:02):
Yeah, I bet it is. But the history there, because I'm just trying to think if you've been doing this for about 12 years, that takes us to 2010 when to my memory, everyone was just starting to do cloud stuff.

Anuj Sawani (06:18):
Right.

Kris Jenkins (06:19):
What would you say the evolution of security, the main security concerns have been over the past 10 or so years?

Anuj Sawani (06:28):
I think I would say the broadest evolution from a security perspective was earlier security meant I just protect my external perimeter. That's all that I needed to do.

Kris Jenkins (06:37):
Good firewall and I'm done.

Anuj Sawani (06:39):
Firewall and I'm done. I don't need to worry about all the stuff inside. And it's important to note that the threat actors here or the attackers, I mean they evolve over time. Once they see that, okay, firewalls are getting better, I'm just going to target something else. And then you start to see attacks like social engineering attacks, which to this date, as you've heard from some of the attacks that have happened from potentially some of the larger, I would say these car rental companies or whatever you want to call them, they're all social engineering attacks. Where you just somehow manage to convince a user to do some action and that you just need that one person to be a little off guard and that's it. You're inside. And once you're inside you just do these lateral shifts so that until you get to your gold mine, which is usually the data and that sort of attack I would say approach is basically now how you've seen it evolve.

Anuj Sawani (07:43):
Where it's gone from figuring out bugs on the firewall to, okay, now I'm going to just get someone to do something wrong and now I'm inside and I'm basically bypassing the barometer and cloud world, the shift has made it even easier where the parameter doesn't exist anymore. There is no four walls anymore, it's four walls plus five different cloud providers, plus a few other SaaS services. And that just makes it a lot easier for attackers to get through it. So you got to start thinking of the model very differently.

Kris Jenkins (08:18):
Yeah. Do you know, it scares me because every now and then I get something which almost convinces me, not quite, but every now and then you get the email that sounds a bit just slightly off and it makes you pause. But then I like to think I'm fairly tech savvy. So what about the people in sales or answering the phone? How are they supposed to cope?

Anuj Sawani (08:42):
It's very hard. It's very hard. And I've done a few of these trainings for some of the sales guys who are not that tech savvy and not to poke at them or anything, I mean they're good at what they do, but it's-

Kris Jenkins (08:55):
Just different set of expertise.

Anuj Sawani (08:56):
Different set of expertise. And I went through things like, hey, if you see your certificate as on your browser just seems weird and it's not coming from the provider. If I go to microsoft.com and the certificate actually is not issued by somebody trusted and it's not signed as a certificate for microsoft.com, there's probably a man in the middle somewhere trying to sniff your traffic. Then emails, you got to look out for the usual spelling mistakes. And that's weirdly common for these phishing emails and I mean these are some basics where people can just do and actually get through. It's hard, but a few simple steps can go a long way. And we do this here at Confluent, but every large or enterprise IT company should be doing these fire drills where you try to attack your own employees and see how they respond and all really good training exercises to get them going and to actually start to get them thinking about some of these things.

Kris Jenkins (10:05):
Yeah, sometimes you need those white hackers, right? Internally.

Anuj Sawani (10:10):
Yes. That's kind what I called myself half the time I was doing this, but yeah.

Kris Jenkins (10:15):
So you used to be a white hacker?

Anuj Sawani (10:17):
Not officially, I mean you call yourself an ethical hacker. The example I gave about hacking into a cellular provider is kind of the same concept. You're ethical so you don't do anything wrong, but you point out the issues in the world in general.

Kris Jenkins (10:33):
And occasionally get rewarded for your efforts, not as often as you should I think.

Anuj Sawani (10:38):
That part actually it's interesting. I wish this was more common, but bounties are actually way more common now than they were in 2010, I would say. I wish they were giving out this big chunk of cash at that time and I probably would've made a chunk. It's only more common now with some of the big providers. They're essentially giving out big chunks of money to identify and report these vulnerabilities.

Kris Jenkins (11:05):
Yeah, this is slightly getting off the point, but I reckon at the moment if you actually want to make a living with security flaws, your best bet is to hack a blockchain. Because that seems to work every single week for 50 million dollars.

Anuj Sawani (11:18):
Yeah. Again, hackers are attracted to these gold mines.

Kris Jenkins (11:23):
Yeah, yeah. It's inevitable. But pulling things back on track, bring us up to date? What have you been working on recently?

Anuj Sawani (11:33):
Yeah, more recently for Confluent, I've been kind of evolving the identity and access management space for our products. And we did find out there's kind of a gap in one of our with Confluent Cloud that the product that we offer, which is the cloud service. And the gap was around for enterprise customers, when you have your own identity provider, you have your human users and all that set up on your identity provider. Some customers also have the service users, service account kind of users also set up on those identity providers. And then a vendor like us comes along and we ask them to set up the exact same structure within Confluent cloud as well. We're like, "Hey, create these user accounts, create these service accounts. What ends up happening is you just end up with two different stores of identity, which is their centralized IT provider, but also Confluent Cloud.

Anuj Sawani(12:33)
Now you've actually replicated a bunch of identities in Confluent Cloud and it's a little painful where you end up having these two stores and you have to keep stuff in sync all the time. Employees leave, you got to make sure you delete these accounts, services get axed, you got to delete those service accounts and making sure that those credentials are not used anymore. So it's keeping stuff in sync, making sure revocation works. That's just an incredibly painful management overhead that gets added and it's a security problem. We have been trying to solve and focus on that problem over the recent past and it's been come up with a few good solutions, which are on their way on being rolled out right now. And we've done some work on the open source side as well to solve this.

Kris Jenkins (13:20):
Yeah. I've definitely worked in places where the principle concern was getting, you create a new user for Dave, but it's only working on one of the systems and you've got to go around and create Dave's accounts everywhere. And inevitably you forget one system that Dave needs access to.

Anuj Sawani (13:37):
Exactly.

Kris Jenkins (13:38):
Then Dave leaves and maybe he's disgruntled and have you revoked all of Dave's keys or not?

Anuj Sawani (13:44):
Exactly. And this is so frequent, you have no idea. And we are just one product in the stack. Imagine doing this for, you have Snowflake and you potentially have one of the big cloud providers. If you're not actually leveraging that centralized identity source, you are basically replicating a few of these stores. And as you just pointed out, if I forget one of them, remember I mentioned the lateral shift, I get through one, then I can jump from that one service to anything that service has access to. And that just flows across. And then that eventually you get to the gold mine that you're looking for.

Kris Jenkins (14:24):
Yeah, I've been trying to explain this to a relative of mine recently where he says, "I use the same password for all these sites that don't matter." It's like, okay, but if one of them gets hacked, someone's going to break into all of them and it's just going to be sold on a list. You just need one chink in the armor, right?

Anuj Sawani (14:42):
Absolutely. And in the enterprise world, that's way worse than a personal-

Kris Jenkins (14:47):
The money at stake, the reputational risk. So I would imagine most people listening to this have something like saved passwords in their browser or in one password and they have 200 separate identity management things. Are you going to tell me the enterprise world for a change is advanced beyond that?

Anuj Sawani (15:09):
Yes. Yeah. If you look at the enterprise world, most enterprises have an identity provider, as I mentioned earlier. And they do have these accounts set up, which you at a minimum, you're likely using it for single sign-on. Where you log into a portal, you get redirected to some identity provider, you enter credentials and you come back and everything's authenticated and great. So at a minimum there are single sign on implemented, but that doesn't usually solve the things like service accounts or programmatic flows. Where you've taken some sort of static keys, you've embedded it into your application and things work until you want to actually either that user leaves who's running that application, or you want to actually ax that application and delete that application and not use it anymore. And potentially you want to limit the access of that applications.

Anuj Sawani (16:05):
All of these are events that how you actually keep that in sync across all your various, the whole ecosystem of the applications that you're running. The ideal way to do this, is everything needs to be centralized in one single identity provider, potentially two identity providers. And I've seen that happen with certain customers. Where you keep human users separate from your service users.

Kris Jenkins (16:27):
Oh, I see. Okay.

Anuj Sawani (16:28):
So you could have separate identity providers because the kind of features that you require for these IT providers may be different. And you could have an Okta and you could use potentially open source Keycloak, for example, as your service identity store. And I've seen that with a lot of financial customers as well.

Kris Jenkins (16:51):
How does that work for, let's go for Confluent as our handy example, you've got an external identity provider. Take me through the technical details of how we're going to reuse that identity.

Anuj Sawani (17:04):
So I'll probably walk through this with an example of, we can take Kafka as an example. But before you get there, I would say there's pretty, I wouldn't call it painful, but an involved step of setting up the identity provider. So if you don't have an identity provider set up for your applications, I think you need to get to that stage where you want an identity assigned to every single application or a microservice that's running within your stack. Setting up that IMD provider, creating those identities and using industry standard protocols is extremely important where you're using protocols like OAuth for some of these things. And you could use MTLS as another option where you're using certificates as a way to assign identity to the applications. But there is a downside of certificates as management is painful, as we know with certificates, handing out certificates and managing time that starts to get hard.

Anuj Sawani (18:02):
So in general, my recommendations get an IT provider in place and then make sure that you have the industry standard protocols enabled. You can pick whatever you like, but essentially we would recommend OAuth as the cloud native authentication standard.

Kris Jenkins (18:16):
So that's on my list of requirements when I go shopping for a-

Anuj Sawani (18:21):
Exactly. You want to support OAuth, you want to support Open ID connect. Those are the new standards that technology providers are going to be supporting today.

Kris Jenkins (18:30):
I don't know what the difference is between OAuth and Open ID from a practical point of view.

Anuj Sawani (18:35):
Yeah, good question. So OAuth has been around a while and that became the way people exchanged credentials over the internet. You got some sort of a code, an authorization code as they call it. You exchange that code for a token and then you can use your token to make all the API calls you need to make. That was pretty standard. OAuth had it, I think it was a little more open ended. What ended up happening is there were a few gaps in OAuth and then Open ID Connect came along to kind of fix those gaps. And Open ID connected a few things, which are I think the most relevant pieces to us is it introduces idea of an ID token. So OAuth uses a concept of access tokens and Open ID connect introduces concept of an ID token. And it's purely to solve the human user problem.

Anuj Sawani (19:27):
How do I use a protocol like OAuth for human users? And that's where the ID token came about. Where in the ID token, I have a bunch of key values pairs that are sitting there and identifying attributes about a human user, what department does that user belong to, and what address and things like groups that you could belong to and stuff like that. So these are all attributes that could be embedded with Open Id Connect inside the token.

Kris Jenkins (19:53):
So it behaves like a password with metadata attached?

Anuj Sawani (19:56):
Yes. And that's exactly kind of that transition where OAuth very often would just opaque tokens, it's just a string, you can't really use it. You have to actually go to an identity provider and say, "Okay, I have an opaque thing, can you tell me whether this token's valid? And can you tell me if other attributes attached to this token?" So it was like a secondary step you had to do to actually find out what that token meant. Open ID Connect, made that easier where I have all my metadata already built into my credential itself and they used the format of JSON web tokens. So essentially these are a pretty well standardized format where I have my metadata defined and all of that metadata is signed. So I can't mess with the metadata. It's a credential with the metadata in it, is digitally signed and I'm using that as a credential that I pass around. And that's what Open ID connected as a layer on top of OAuth. So it is actually built on top of OAuth as a protocol.

Kris Jenkins (20:56):
So because it's signed but it's not encrypted. So I can't tamper with it, but I can read it.

Anuj Sawani (21:01):
And identity in general is typically not sensitive, but I can see that sometimes being sensitive as well. And INT providers let you control what sort of attributes you want to embed inside the token. So if you think there's going to be sensitive information, I would not insert it in there. With that said, there are actually standards that are extending these protocols to start encrypting some of this metadata. And I've had some customers actually ask about supporting these things as well. So people build on top of it. Some of this is actually very new. I would say most providers probably don't even support this encrypted format today.

Kris Jenkins (21:38):
Okay. What should we be looking out for, for the future? Is there a label we should be Googling planned support in 2024.

Anuj Sawani (21:47):
For the encryptions?

Kris Jenkins (21:49):
Yeah.

Anuj Sawani (21:50):
Potentially. So the way I look at most of these features, I wait for approval of these standards. I want them to be RFCs, published and approved. And the advantage it gets, and I actually recommend this to even technology vendors is integration becomes so much easier. You're not doing all these custom things, it's based on a standard. So let's just follow that so that the world follows the standard. The moment you start doing your own thing, now everybody starts building these shim layers, sitting in the middle trying to track stuff.

Kris Jenkins (22:20):
And you end up needing a web of connectors instead of just one agreement.

Anuj Sawani (22:25):
So there are standards around this and we will eventually support all of these things over time as we go there. So going back to I guess the Kafka side.

Kris Jenkins (22:36):
Yeah, we've got our entity provider. How do we integrate that with Kafka?

Anuj Sawani (22:40):
Yeah, so I would say there are three broad steps. The first one is actually once in a lifetime sort of a step where you build trust. You use something called a discovery endpoint that most identity providers provide to you. And you enter that discovery endpoint into Confluent Cloud and what that discovery endpoint provides to us, it's a metadata document about your identity provider. Tells us where to find your public keys, where's the token endpoint? How do I know what the issuer is of this token? So there's a bunch of metadata embedded in that discovery document that I can pull from your identity provider. And I use that so that I build that trust relationship where now I have public keys so that I can validate the token. And I can use the rest of the information to potentially parse the data that is sitting inside the token what algorithms are supported and things like that.

Anuj Sawani (23:34):
So that's like a one time setup thing that you do with cloud and that allows you to now start using tokens sent over to us.

Kris Jenkins (23:45):
Is it genuinely as easy as giving Confluent Cloud the right URL?

Anuj Sawani (23:50):
It is. It's literally it.... And again, I don't take credit for this, this is an RFT. So there's something called a discovery endpoint with Open ID connect and I just give me the discovery endpoint and I'll grab what I need and you can set up the trust relationship right there.

Kris Jenkins (24:08):
I expected it to be more painful than that, but I'll trust you.

Anuj Sawani (24:11):
Yeah, no, it is pretty simple. [inaudible 00:24:16]

Kris Jenkins (24:15):
I just have to ask, is there a mirrored step where the identity provider is asking for our credentials at Confluent or is it just the one way message?

Anuj Sawani (24:24):
It's a one way message. So it's similar to certificates where a certificate authority just gives out the public certificates to you. For example, you in your browser, you just have a bunch of certificates stored. So when you visit a website, you can trust it. Same concept, it's a one way, all I need is the public keys. When tokens come in, I can validate and move on. I trust you and I can check if it's valid and it's not expired, and I'm basically do my other things after that step.

Kris Jenkins (24:58):
Okay, cool. Right. Next step.

Anuj Sawani (25:00):
Next step. All right, so now we'll be coming into the application side where now let's take a Kafka application. And this is really important. You do need to upgrade your clients, your Kafka clients. If you're using open source Kafka, you do need to get to the latest version. So three one onwards, Apache Kafka, three one onwards is when we introduce support for secured OAuth.

Kris Jenkins (25:24):
Okay, that's KIP-768, for those who like their KIP numbers.

Anuj Sawani (25:27):
That is correct, yes. So KIP 768, we worked with the community, introduced both client side and broker side to the ability to actually, the client side to send these tokens over in a secure manner and on the broker side to actually validate these tokens in a secure manner. So both these capabilities were introduced in KIP-768 was out there in AK 3.1. And our hope is this just propagates across the whole ecosystem where all the different community clients that are out there eventually start taking this on. And we've already seen that happening. Kafka is another example where all the [inaudible 00:26:03] Kafka, deriv clients now support OAuth in there.

Kris Jenkins (26:07):
That's a lot of clients. Because that's the C library that so many use.

Anuj Sawani (26:10):
Dot net go. Yeah. So basically the different other non Java clients essentially now have started to support OAuth as well. And anything that now has a dependency on AK 3.1 or later. Basically automatically supports OAuth. So that was our first step. Let's just get the client support and the ecosystem out there. And that was the goal, to roll that out first. We rolled this out almost in January or February this year. So it's been a while and we've seen that trickle down across various vendors as well. So taking the application, so that's the Kafka client. So make a bunch of configuration changes, provide a token endpoint, and you're basically good.

Anuj Sawani (26:54):
Now what the client does is it first talks to your identity provider saying, "Hey, here are my credentials. Here's who I am, can you give me an OAuth token?" And the INT provider validates their request and responds back with a token. And then the Kafka client takes that token and sends it over to us, which is Confluent Cloud. And we take this token, strip it apart. So usually a Jot token, as we call it, JSON web tokens. The Jot tokens has three parts. There's a header to it, there's a payload, and there's a signature. The header just says here, "Hey, this is the algorithm being used, this is the public key that's being used." The payload has all the metadata that we talked about before, all the different attributes, the key value pairs, and then the signature is just validating all the stuff that's above it, just making sure it's not tampered with.

Anuj Sawani (27:47):
That's what we receive. So we take the signature, validate that nothing's been tampered, we check the public key that's being used and see, hey, do we have a trust relationship for that public key? And then we validate that and we make sure that the algorithm that's being used to support it, and we make sure that that's what we use to validate the signature as well. So it's usually like RS 256 or EC 256 as one of these algorithms. So that that's basically us validating the token and then we take the attributes, those attributes are going to be used to make authorization decisions. What OAuth did for us was the authentication layer, and then we take the attributes and make authorization decisions on it. What level of access should I provide this client that's coming in?

Anuj Sawani (28:33):
That's the second step. Basically grabbing the token, sending it over to Confluent. I would say the second and third step, I kind of mixed the steps up out there, but the second step was grabbing the token from your INT provider. The third step is actually sending it over to Confluent Cloud over OAuth [inaudible 00:28:54]. That's basically the Kafka protocol and sending it over to us. And then we do all the validations on our side automatically and ensure the session gets opened.

Kris Jenkins (29:03):
So is the connection from Kafka to the identity provider, is that a one time thing?

Anuj Sawani (29:11):
It is... Could be, but it will definitely be multiple times over a longer period of time because tokens are short lived credentials, which is kind of the reason why we like these things. Is if somebody ever steals a token, my risk is only up to the point where the token is valid. So if I have a one hour long token, if it gets stolen, it cannot be used beyond that one hour. The Kafka client, actually again, part of KIP-768, what we did is built into the callback handler in the client, which does this automated refresh of the tokens where it keeps a valid token at all times in the Kafka client. So if the Kafka client ever needed it to reestablish a session or renegotiate a session, we would always have a valid token available. So as-

Kris Jenkins (30:00):
Might an identity provide a call again to refresh it transparently.

Anuj Sawani (30:04):
And it doesn't wait all the way till the end of the expiry, it actually does it close it to like 75 or 80% of the lifetime of the token.

Kris Jenkins (30:10):
Oh, like preemptive refreshes.

Anuj Sawani (30:12):
Exactly, yeah.

Kris Jenkins (30:14):
That's smart.

Anuj Sawani (30:14):
To make sure to ensure that you don't, basically you don't get in a point where the latency is too high, towards the end. And you're just waiting and now suddenly you don't have a valid token for a short period of time.

Kris Jenkins (30:25):
So this has been battle tested to get to that point?

Anuj Sawani (30:29):
Yes. [inaudible 00:30:29] Industry standard, which is great. It gives us so much guidance. It's proven in the battlefield, for sure.

Kris Jenkins (30:35):
And presumably you periodically recalling that discovery end point to check if anything's changed, if certificates have been updated.

Anuj Sawani (30:44):
Great point.

Kris Jenkins (30:44):
Public key.

Anuj Sawani (30:45):
So public key leaks do happen, rather private key leaks, and these are asymmetric keys. So if the private key leaks or is breached, for some reason these identity providers do have to rotate their keys. And they do that in general as good security practice, they will likely rotate once a year typically. Yes, we do have to keep reaching out. Some good identity providers provide you a cache time where they tell the, "Hey, cache the public key only for this period of time and come back and get a brand new public key, or just refresh your public key if needed." So it gives us a timer, but there are other INT providers who are not that nice and they just don't send a cache timer. What ends up happening is we just have to keep making periodic calls where once a month we go and get the latest key, which is typically hasn't typically changed unless it's a breach. So yeah, we do have to keep reaching out to the discovery endpoint to ensure we have the latest keys.

Kris Jenkins (31:48):
That makes sense to me. And would you expect to be able to use this same... I mean, as we talked about standards and connections, would you expect to have this discovery endpoint set up once and most of your cloud providers would be able to use it?

Anuj Sawani (32:05):
Yeah, the discovery endpoint is set up by the identity provider. So yes, that is absolutely the goal. Where as an application, I'm actually talking to multiple services. I'm not just talking to Kafka here. I would likely be talking to Snowflake, I would likely be talking to maybe a database service. At the end of the day, it's one application, one identity across all these different providers. You want to be able to centralize that. When I have an audit log that's coming from four different providers, I can correlate all that information to one single identity rather than me trying to do some magic stitching that I have to do with my audit logs. It really helps to have that single identity. Yes, you want all your cloud providers talking to the same identity provider, you set it up once, it's the same discovery endpoint for all of them, and that's really the way to go. You centralize this as much as possible.

Kris Jenkins (32:57):
Do you get, I can imagine some people, I'm thinking of banks here, having a centralized identity provider that's kind of old and doesn't support this. What do they do? Do they end up building their own proxy to support just this part?

Anuj Sawani (33:17):
I think you've touched upon two things, which is interesting. Weirdly, and I kind of expected this when we built this out. Which is a lot of these banks already have OAuth providers in house.

Kris Jenkins (33:30):
Really?

Anuj Sawani (33:31):
And they're already using it for a lot of their internal services. They're not using it for external services very often. All the microservices are already doing this dance of OAuth and all these things. So it's very common for a bank to have this. But you called out the second part of this, which is it's not very often exposed externally. They end up doing a proxy where they end up somehow poking a hole in the firewall, which is a huge exception for a bank, typically. They would either do a proxy and they kind of lock it down to the IP space of a Confluent, for example, to say, only Confluent can go grab stuff from this URL. There are various controls that think about when they open this. So yeah, as you said, they are very particular about this. When they open it up externally, they want to make sure there are controls in place for them to secure it a little better than some of the other customers we work with.

Kris Jenkins (34:28):
But the ideal is that you would expect this to be publicly available?

Anuj Sawani (34:33):
Yeah, I mean ideally you should not have sensitive in... It's a public key. All I need is a public key, so this should not be sensitive in any way. Like an attacker can't do anything with a public key, really.

Kris Jenkins (34:47):
If they can, you've got bigger problems.

Anuj Sawani (34:49):
Yes, exactly. But there is some, I would say a very minor amount of metadata tied to the discovery endpoint. There could be some extra information that a bank doesn't want to be externally available. I can see their point, but at the end of the day, this isn't a huge security issue, even if it is public. And Okta, Google, all of their discovery endpoints today are actually public endpoints. So unless you're using an on-prem or internal identity provider, they're typically public anyway.

Kris Jenkins (35:22):
So that leads me into, I guess the final step of this. What advice would you give someone who wants to move into this, we just have one set of credentials in one centralized place, world?

Anuj Sawani (35:38):
A few suggestions, so one around identity provider, and we spoke about this, where ensure you're using industry standards, it'll make your life so much easier over time. So you're not as, as I said earlier, building these layers, sitting in the middle, translating stuff around. And over time, I'm hoping the world eventually converges on these standards, though I know it's not a hundred percent, but at least 75% of the vendors out there are likely using these standards. Standards, it's really important. Second part is in the Kafka side, I would say the clients that is actually... So if you have an existing deployment, it's not easy going to each one of these application teams, getting them to upgrade, asking them to change the configuration. Upgrades are hard, and we know this, and OAuth support, if you want to use something like OAuth, that's going to be a long cycle.

Anuj Sawani (36:37):
Ensure you get started on that as soon as possible, like the upgrade cycle, reaching out to these app teams, getting their code up to date to the latest version of AK or whatever community client that you're using, ensuring that gets you start that cycle at a minimum. If you're using a new deployment, great. Then start from that version onwards so that you don't have to worry about it later. And it's totally fine to have a mixed environment. You can have OAuth and a different authentication provider for a short transition period. And that's totally fine because transition takes time in, and OAuth is, I would say, open heart surgery in a way.

Kris Jenkins (37:14):
What makes you say that?

Anuj Sawani (37:17):
It is. I mean, the transition, as I said, you got to do client upgrades, you got to do configurations, you want to have an INT provider set up. It's not as simple as switch it on and I'm done. So it's not something like TLS where I just turn on encryption and everything's great. This is involved changes on the application, so you want to be able to start that process early. It's okay to have a transitionary period and move over and switch over time. Those are two things I would call out. Use industry standards, start this process and plan this out effectively. And your end outcome should be what we described earlier, which is centralized identity and access management in one place across all your vendors. That's what should be your end goal here.

Kris Jenkins (38:05):
That makes perfect sense to me. One place to rule them all and in the darkness to bind them.

Anuj Sawani (38:12):
Yes, exactly. You know where to find them. It's only one place you're not hunting around every single room, opening doors and trying to find them.

Kris Jenkins (38:20):
Yeah, and when you kill them, you kill them dead.

Anuj Sawani (38:22):
Yes, exactly.

Kris Jenkins (38:23):
Cool.

Anuj Sawani (38:25):
It almost sounds like a first person shooter game at this point.

Kris Jenkins (38:28):
Yeah, yeah. I'm often guilty of falling into a video game metaphor.

Anuj Sawani (38:36):
Great.

Kris Jenkins (38:36):
But security is that high stakes, right? Security is that kind of...

Anuj Sawani (38:39):
Yeah. Yeah, absolutely.

Kris Jenkins (38:41):
Well, thank you. I think I've got the diagram in my head. I hope the listeners have. Anything else we should bear in mind before we leave you?

Anuj Sawani (38:49):
No, I think the future has a lot to hold. I would say authentication is fairly standardized, but authorization is not so standardized and I'm hoping it eventually gets there. There are actually less, I would say, RFCs on authorization itself, but I'm starting to see some new drafts come up, so I'm tracking that very closely. And we'll be hoping to hopefully adopt some of those newer standards over time around authorization to at least make authorization easier as well, which I know today is sometimes a pain point with certain vendors. But we are trying to make that as easy as possible over time. This future is pretty strong and truly looking forward to evolving that for Confluent as well.

Kris Jenkins (39:31):
Oh, cool. Well, we'll stay tuned then.

Anuj Sawani (39:31):
Yeah.

Kris Jenkins (39:31):
Anuj, thanks very much for joining us.

Anuj Sawani (39:38):
Yeah, no, absolutely. It was great to talk to you today as well, Kris.

Kris Jenkins (39:42):
Thank you, Anuj. There are lots of good reasons to make sure your Kafka clients and brokers are up to date. I know there's a lot of excitement at the moment about Zookeeper and the last days of Zookeeper and rightly so. But also identity management. Let's make that simple. Let's upgrade to 3.1 or higher and just make that work seamlessly. That would be a good thing for everyone. Before we go, Streaming Audio is brought to you by Confluent Developer, which is our site that teaches you everything you want to know about Kafka and event systems, including quite recently, some cloud security courses.

Kris Jenkins (40:21):
So if today's episode has left you hungry for more cloud security, check it out at developer.confluent.io. If you've enjoyed today's episode and learnt something from it, we would very much appreciate a like or a rating or a review, all those things that feed into the algorithms because they tell us which episodes you found most useful and they tell other people how they're going to find us. So if you've got a second, take a moment please to click the clicks. And with that, it just remains for me to thank a Anuj Sawani for joining us and you for listening. I've been your host, Kris Jenkins, and I will catch you next time.