Streaming Audio: Apache Kafka® & Real-Time Data

5 Years of Event Streaming and Counting ft. Gwen Shapira, Ben Stopford, and Michael Noll

August 31, 2020 Confluent, original creators of Apache Kafka® Season 1 Episode 117
Streaming Audio: Apache Kafka® & Real-Time Data
5 Years of Event Streaming and Counting ft. Gwen Shapira, Ben Stopford, and Michael Noll
Show Notes Transcript

With the explosion of real-time data, Apache Kafka and event stream processing (ESP) have grown in proliferation, with event streaming technology becoming the de facto technology transforming businesses across numerous verticals. Gwen Shapira (Engineering Leader, Confluent), Ben Stopford (Senior Director, OCTO, Confluent), and Michael Noll (Principal Technologist, Confluent) meet up to talk all about their last five years at Confluent and the changes they’ve seen in event streaming. They discuss what they were doing with Apache Kafka® before they arrived at Confluent, challenges in event streaming challenges that have arisen, and their favorite use cases. They then talk through what they think the Kafka community is undervaluing and where they think event streaming will be in the next five years. 

EPISODE LINKS

Tim Berglund (00:00):
Five years at a startup as young as Confluent, is pretty long tenure. Gwen Shapira, Ben Stopford, and Michael Noll have just passed their five-year anniversaries and these three characters spent some time with me in the virtual studio to reflect on how the event streaming ecosystem has changed in that time. It's all in today's episode of Streaming Audio. A podcast about Kafka, Confluent, and the cloud. 

Tim Berglund (00:29):
Hello and welcome back to another episode of Streaming Audio, I am, as always your host Tim Berglund, and I'm joined today by three friends and co-workers, Michael Noll, Gwen Shapira, and Ben Stopford.

Tim Berglund (00:43): 
Michael, Gwen, and Ben, Welcome to the show.

Ben Stopford (00:46):
Thank you, Tim.

Michael Noll (00:46):
Hello everyone.

Gwen Shapira (00:46):
Tim.

Michael Noll (00:50):
No, it just happened Tim, what you told us, but we should know too, we should speak one after the other, and not all at the same time. We apologize for that.

Tim Berglund (00:57):
That's okay, but this is such a great lesson in media access protocols, and it's really not clear how to do this without being able to look at each other. There's all kinds of things you do in a room to not talk over each other, on a podcast when it's just a microphone, it's an adventure.

Tim Berglund (01:13):
Anyway, hey the three of you have been on the show before and obviously we'll link in the show notes to your previous episodes and Gwen, you have a regular show of your own, Ask Confluent, which I have had the privilege of being a guest on, I think twice, and it's been wonderful both times. But you are all here together because you are about, right now, within a few weeks of right now, this is your fifth anniversary at Confluent, is that right?

Gwen Shapira (01:43):
Yeah. We all joined, earlier August 2015.

Tim Berglund (01:47):
Wow, okay, so I'm coming up on three-and-a-half years, which feels like a long time, feels very senior relative to most people in the company, but five years, you guys are early so-

Ben Stopford (01:59):
And we all joined on the same day, randomly.

Tim Berglund (02:02):
Did you really?

Ben Stopford (02:03):
Yeah.

Gwen Shapira (02:04):
Yeah.

Michael Noll (02:04):
Yeah.

Tim Berglund (02:05):
Oh.

Gwen Shapira (02:05):
And Michael sent cupcakes to the office.

Michael Noll (02:06):
Yeah.

Tim Berglund (02:09):
That is amazing.

Michael Noll (02:09):
You remember that Gwen, I actually forgot it. Thanks for remembering.

Gwen Shapira (02:15):
Everything was done here from there.

Tim Berglund (02:18):
In my first week, I made margaritas for the office.

Ben Stopford (02:21):
I remember because I just remember being very disappointed, I didn't get any cupcakes, just because I wasn't in the office. There was no cupcakes sent to the London office, and I would just like to say, Michael, that's been noted.

Michael Noll (02:38):
You know what my very good apology for this is? Or rather my very good excuse? There was no Confluent office in London then. [crosstalk 00:02:46].

Tim Berglund (02:48):
That was going to be a follow-up question, Michael, just so you know, I was thinking, wait, wait, was there even one? But apparently not.

Ben Stopford (02:55):
He couldn't. It's funny you say that, but on Friday I actually got a notification from that clever Google photos thing, with a photo of the first Confluent office, which was actually opened five years ago on Friday, unofficially.

Tim Berglund (03:13):
Really?

Ben Stopford (03:13):
Well it was just, when I saw Confluent office, it was-

Tim Berglund (03:16):
Yeah, it was more of a lean-to structure in-

Ben Stopford (03:22):
Yeah, yeah a fancy bivouac.

Tim Berglund (03:23):
-Paul Mall, sort of a tarpaulin tarp with a branch and yeah.

Ben Stopford (03:31):
Yeah, and a little hand-driven generator for the warm dryer.

Tim Berglund (03:33):
Yeah, you had to take turns, back in those days. Yeah it was an A-round company, you've got to be scrappy. Anyway, we thought it would be good to get you folks on, and really just talk about what the last five years have been like.

Tim Berglund (03:51):
And I wanted to start with, and I always like to ask guests this, I know I've asked all of you this before but how you got to where you were when you joined, or if I could say a little more specifically, since you've all told that story on-air before, what were you doing with Kafka, just before joining Confluent?

Tim Berglund (04:13):
And I'm going to make the media access explicit in this case, I'm going to just go down the list in the order in which I see you on my screen here in the podcasting tool, so Michael what were you doing with Kafka, before joining Confluent?

Michael Noll (04:27):
So if I recall correctly, at the company I was working at the time, which was another US company, but I was based in Europe, we were one of the first users of Kafka over in Europe and at the same time because it was a struggle initially, I started writing articles on my blog of how to use Kafka, and a few other distributed systems, that turned out to be quite popular. So that was my initial foray into Kafka, and then we started to use it at a pretty large scale, particularly for the times then, for some real-time processing use case like cyber defense, cyber security and so on. And then subsequently I decided to join Confluent, which was about the time when Confluent was founded. Jay reached out to me because we previously collaborated on some Kafka documentation I think, and then those Kafka blogs were becoming popular that I had on my website, so I think that was the origin for me and Confluent, if I recall correctly. What about you, Gwen?

Gwen Shapira (05:28):
Yeah, so I was with Cloudera at the time, and about I think six months earlier, Cloudera decided to offer Kafka support for its own customers, and they needed an engineer to fix bugs, advise customers on the right ways to use Kafka, how to make things [inaudible 00:05:52], these kinds of things, I was that engineer and that's how I got involved in the Kafka community. I mean you can't really fix bugs without talking to a lot of people in the community.

Gwen Shapira (06:02):
I think the first Kafka KIP was mine, way back when. Yeah, so that's how I got to know people, and then when Confluent was founded. They reached out to me a few times, at some point Cloudera decided to move Kafka development to Budapest, and that looked like a good time to move.

Tim Berglund (06:23):
You said, move Kafka development to Budapest?

Gwen Shapira (06:26):
Yes.

Tim Berglund (06:28):
Oh, okay which is a lovely city.

Gwen Shapira (06:32):
Yeah, it is a lovely city, just I felt like being in the company where Kafka is not the thing to offshore but is the thing, was a good move.

Tim Berglund (06:46):
Maybe better aligned with your interests.

Gwen Shapira (06:48):
Yeah. I don't regret that.

Tim Berglund (06:50):
Yeah, no good, good. I've flown my drone in Budapest, I just want to get that out there.

Gwen Shapira (06:54):
Ooh. Nice.

Tim Berglund (06:57):
Okay, I'm obligated to put a link in the notes now to that video.

Gwen Shapira (07:00):
Yes.

Tim Berglund (07:00):
Ben. Ben, what you up to before you joined?

Ben Stopford (07:03):
So I came from, kind of the other way around actually. I worked for an investment bank for quite a while, say for about seven years. It was about six of them building... What was it? A data platform, it was really a database that sat in the middle of the company. This pattern is quite popular in finance, there's a few different implementations but what was interesting about this database, this somewhat bespoke database, and it had a messaging system, was the system of record.

Ben Stopford (07:46):
So it had a view you could run queries against, and it had a schemer and stuff. And when you wrote data, it actually committed it to a messaging system, and a file system. And the idea was to try and join these two concepts together.

Ben Stopford (08:02):
It was also a special type of database, it's called a bitemporal database, which yeah, not that well known but actually a little bit underrated in my opinion. So the really nice thing about this type of database is that you keep all of your events that occur in that raw format and then you can receive an event, and you can use that event or you can go back and query the database basically at the time, the exact time that the event was created.

Ben Stopford (08:35):
So for example, if you want to look up, in this case, maybe counter bot information was or something like that, you can tie those two things together. And started off very much as a central database for integration with this messaging system as a secondary, ended up becoming the messaging system very much became the primary thing, which is unsurprising, I guess for an integration perspective. And then you had this database that provided more like a materialized view. It was a bit strange because you wrote through it as well as creating, being able to materialize things asynchronously.

Ben Stopford (09:13):
But yeah, it was a pretty interesting piece of technology because it could support a very high-throughputs. You could hit it from a compute grid and it could do very fast joins. But it also had this event driven side, and basically to get it to work and we used to use TIBCO. And to get it to work we basically had to shower it with lots of versions of TIBCO together. And this all started in about 2008/2009 and then obviously Kafka pops up a bit later.

Ben Stopford (09:44):
By this point in time, you've realized well the database part, that's pretty useful but actually sharing databases between lots of parts of a company is actually pretty hard to do, like walk that path is hard. You can do it, but it's hard. And actually what became apparent was that having a messaging system but with that ability to query or replay data so you can materialize different views, was a very powerful pattern, particularly when you've got something complicated like a big organization.

Ben Stopford (10:13):
So that's how I came to Kafka the other way around, trying to solve a problem, certainly the central nervous system problem without it, and then realizing that Kafka was the solution we needed in the first place. And then ended up coming out to Confluent, then I guess to try and help other people and help us build the technology that made that vision a reality.

Tim Berglund (10:42):
Wow, so that's really making the event log, the system of record, and creating materialized use around, maybe in an unconventional way, in that case with bitemporal database but that basic pattern is a dream you were living five years ago and before, which in terms of people I talk to in the community and among Confluent customers that's a dream people are still trying to make real. People are doing it, it's not insanely avant garde like it was five years ago, but it's hardly a completed vision. That's cool. I actually didn't know that's what you were doing beforehand, I thought you were the drummer in a rock band that played in the Surrey music scene.

Ben Stopford (11:29):
I think we should have a thing where we avoid the term, Surrey in this podcast, or actually in general. It's like a no-go area, you know? It's a bit like Iowa, or somewhere.

Tim Berglund (11:42):
I was just going to say maybe Terre Haute, Iowa you know? That sort of thing. And I don't even know if I'm pronouncing out that the people there do. It's actually Indiana, it's not Iowa but even Americans sometimes get those confused. I am going to get tweets about this but let's not cut it out.

Tim Berglund (12:00):
So all right, and Ben, we had established that Confluent, London five years ago was kind of a tarpaulin and a hand-cranked generator and an open fire, that you made cowboy coffee over. Of course the London office these days is quite well-appointed, modest but professional.

Tim Berglund (12:20):
Gwen, what was Confluent like five years ago when you joined?

Gwen Shapira (12:26):
Yeah, it was a lot fewer people but percentage-wise I think mostly engineers. I think we had Sabrina from HR, Christa running the office, and Gretch doing marketing, Kylie doing sales... No, Kylie actually joined after us. So yeah, we didn't even have sales at the time.

Gwen Shapira (12:50):
And most of is in the US fit into a single room and that was our office, we rented a room. It was a dentist's office and then he had a spare room and we rented it out in downtown Mountain View. And it had this kind of conference room but only one of it, and only one restroom. So we actually spent a lot of time in nearby coffee shops. You couldn't really talk to anyone in private.

Tim Berglund (13:21):
Right, right if you wanted to. Of course, during the pandemic it's been six months since I've been, since anybody's been to an office.

Gwen Shapira (13:29):
Yeah, I was thinking, do we even remember what offices were like. But they were quite nice, you could overhear a lot as the people are walking on, and had those chance run-in conversations, [inaudible 00:13:41].

Tim Berglund (13:40):
Right, right and not just in the office, but when we were in Paulo Alto at Cooper, a block away, if you wanted to have a private conversation with somebody, for darn sure, you did not go to Cooper, because-

Gwen Shapira (13:52):
Yeah, it was like our second office, pretty much.

Tim Berglund (13:56):
Right because there was like 15 Confluent people, that's a fully-owned subsidiary and a delightful place, by the way.

Gwen Shapira (14:00):
Fully-owned subsidiary.

Tim Berglund (14:02):
This podcast has been brought to you by... Michael?

Gwen Shapira (14:06):
[crosstalk 00:14:06].

Michael Noll (14:10):
Yeah for me, what can I say? My first reaction to your question Tim, was wow Confluent was tiny back then, just like Gwen said. I remember that I literally built my own desk. This was not that surprising because I've been working from remote here in Switzerland since the first day I joined Confluent, so unlike Gwen, I had the coronavirus pandemic feeling already, since the past five years.

Michael Noll (14:33):
So that is how I remember Confluent, it was just a very tiny team back then. You knew everybody at least for the first one or two years, even when you were working remote, like I do. Now that we are more than 1,000 people, it is certainly no longer the case. You can answer your question Tim, in a few different ways but my personal point of view as an employee is what I just said. And personally for me, this was my first experience in this stereotypical hyper-growth Silicon Valley startup, and I can tell you it has been quite a ride thus far, with ups and downs, many challenges and a lot of fun.

Michael Noll (15:06):
And it's certainly something that you can never experience by just reading a book about it.

Gwen Shapira (15:11):
Yeah.

Ben Stopford (15:12):
Yeah, yeah.

Michael Noll (15:13):
For example, and there are many examples that I could bring up here, but I just name one, you really get spoiled by the tone of your colleagues. So today it would be very difficult for me to be at a place where that bar is lower. What about you, Ben?

Ben Stopford (15:29):
Yeah, I agree. I think my overarching memory is I feel like slightly dysfunctional as I'm sure most startups are, but there's something that was really cute about it. It was just like a small number of people, you maybe didn't really get the feeling that we really knew what we were doing. We felt like what we were doing is really important, so we were just getting on, and doing it.

Ben Stopford (15:57):
Yeah, I don't know. Probably my overriding emotion when I look back at that time was like, it was just cute, maybe certainly naïve, but cute and I have very fond memories of that time.

Tim Berglund (16:09):
Sure, coming from an investment bank, it would seem cute. And you're right-

Ben Stopford (16:15):
Yeah, if there's one adjective that is very, very rarely used to describe investment banks, it would be cute.

Tim Berglund (16:22):
Nobody says that.

Ben Stopford (16:24):
Nobody says that, it's true but we said it here first on Streaming Audio.

Tim Berglund (16:29):
We did, we did and this recording will endure. And I think certainly, early stage startups and just high-growth startups, any company, you can always find the pockets of dysfunction, but as startups go, I could just say, as dysfunction in startups go, you have no idea. And I'm not just saying this because this is a Confluent podcast and I work here but wow there are so many things, bad things that startups can do, and we just don't do those bad things very well, and that's good.

Ben Stopford (17:04):
Yeah, I will say that, that ethos we have today, that the founders really instilled of smart and humble and just really feeling like everyone had the right values. That was always there right from the start, I thought. And that's being a big thing, why I have stayed so long.

Gwen Shapira (17:23):
I am thinking about it, like we were almost grownups. The founders were in their mid to late 30s I think, and most early hires were that range, and it felt like just working with adults. When you hear the horror stories about startups, a lot of times it's basically college kids doing the college experience in the company and we never really had that.

Michael Noll (17:54):
Yeah, you're right Gwen, I remember that as well now that you're putting it like that. When people ask me about my experience at Confluent, for a Silicon Valley startup, that I said, "Surprisingly, the average age of our team is actually a bit higher than what I've seen in the news, et cetera from other companies," just like you said.

Gwen Shapira (18:13):
Yeah, when I was still working on Oracle, I was consulting to a startup and I remember being the oldest person around and I was 29 or 28 at the time. And I was like everyone's mum trying to keep everyone in line, so yeah.

Tim Berglund (18:34):
Oh yes. I'm interviewing you, this is not about my story but I can't resist. The first startup I went to work for, I got the job offer on my 40th birthday and it was six or so weeks later when I started. And I got there and it was anything around 100 people at the time, and I was I think one of the oldest three people in the company. And that stayed true for the next 200 people worth of growth. And it's not bad, I'm perfectly comfortable, I don't work to keep up with the young people, I invite the young people to try to keep up with me. But there is maybe a certain distribution that is healthy of different sorts of people and I think we do a good job realizing that.

Tim Berglund (19:21):
Now, that's Confluent, how about event streaming itself? Ben, I want to start with you. You had an interesting story about what you were doing five years ago, but what do you think has changed the most in event streaming since you started?

Ben Stopford (19:38):
Ooh, what's changed the most? Yeah, I guess there's quite a lot in there but it's a big thing, but it's probably the thing that I noticed the most is actually just the size of the community, it's just become so big. When I say the community, I guess I mean the number of people that are contributing to the whole idea and that's what makes it, right? The size of that idea and people are pushing it in different directions. You got people building microservices and building platforms at the centers of companies, and building big data systems and this is something that wasn't really... The seeds were there five years ago but it wasn't prevalent.

Ben Stopford (20:33):
So yeah, for me I think it's the effect of the community that have made, not just the technology, it's actually more like a style of building architectures.

Tim Berglund (20:45):
Yeah, ah and that's what... Well, there are a number of things that get me out of bed in the morning, usually just waking up but professionally speaking those changing architectures are what get me out of bed in the morning. This morning actually, it was a bad dream, an action adventure dream gone slightly wrong but that's a topic for another podcast.

Tim Berglund (21:03):
Gwen, you were at Cloudera and my guess is the Kafka worldview there was big pipe Kafka, and not our event streaming vision, my guess is you have an interesting perspective here too, what do you think?

Gwen Shapira (21:21):
Yeah, I would find that in Cloudera, yeah I was definitely an odd person trying to say, "Hey the future is real-time processing," but I think really the biggest thing is that five years ago even people who had this vision, it was so hard. And then you forget, five years ago we didn't even have the Crown consumer, you had to go to Zookeeper to commit offsets, we did not even think about exactly-once, people said it was totally impossible.

Gwen Shapira (21:58):
The problems that we are trying to solve now were not even known to be problems back then, like how long does it take to recover state from a compacted topic is a big issue that people are trying solve really, really hard the last year. The entire problem space was not even a problem space because we didn't solve the problems that would later lead to the problems that we have now.

Tim Berglund (22:22):
Man, that's often how it goes. Michael, what do you think?

Michael Noll (22:26):
Gwen and Ben already had good answers, but I have a good pun.

Tim Berglund (22:30):
Ah perfect.

Michael Noll (22:31):
Yeah, I would say for me it's mostly about adoption because I think really event streaming has become mainstream. Got it? My background is a bit different to Gwen's and Ben's, I've been a researcher, an engineer by training but at Confluent, I've been focusing for the first few years on product management, for stream processing, Kafka streams and KsqlDB. So the thing that always impressed me the most was really the adoption, I'm not sure what the current number is but I think it's more than 80% of Fortune 100 companies are using Kafka, I mean this is huge.

Michael Noll (23:06):
But for me, the most exciting is, that it's not just businesses, I'm living in a rural part of Switzerland, there are a little bit more than 1,000 people in the village where I live here. But even so, most people here, including my parents, families and friends, they are users of Kafka without their knowing. Now of course Kafka is used by stock exchanges, the likes of Netflix, Apple, LinkedIn and so forth. But what excited me the most was that we see that children's hospitals are using Kafka. They're using that to monitor head injuries of little children and to predict whether a doctor needs to come and help them because otherwise they might die in the next 15 minutes.

Michael Noll (23:45):
Another example is Kafka is being used for astronomy, so if there is something happening in the universe and one of the wide field telescopes, like wide angle lens for photography, detects it, then Kafka is used to orchestrate all these large telescopes across the world to jointly point to this phenomemon and start recording. And that is something that I personally would have never guessed what would happen with something that I used for cyber security or some of these use cases in the past. So that is what excited me the most.

Gwen Shapira (24:15):
I know it's great, and this is really motivating when you say that, maybe the next review we do to a PR should be a bit more careful and keep children's head injuries in mind.

Michael Noll (24:27):
Yeah, exactly. When I heard about that use case I was really flabbergasted, I said that this is really not just a theoretical use case but someone is actually doing this in practice. And I think this was a hospital in Atlanta, which might have been actually a feature on your podcast, Tim. I don't recall exactly.

Tim Berglund (24:45):
Yeah, absolutely. We'll link to that episode but that was Children's Hospital of Atlanta even using Ksql, and the specific use case was intracranial pressure monitors in, I think it was in neonatal ICU, as a pediatric ICU. I might be getting the age slightly wrong there. But you wouldn't think that intercranial pressure monitors would throw off all that much data, like how often can you sample that? But apparently the answer is very often and there's a lot of data gathered and so it was an interesting streaming use case.

Gwen Shapira (25:19):
And it's definitely important to get it correct with ordering guarantees and delivery guarantees and all the good things Kafka does.

Michael Noll (25:28):
Yeah exactly. Now I recall, I had to take my son to the hospital because he bumped his head against the wall when he was three years old and he lost consciousness, got blue lips et cetera, so I know how it is like for the parents in the emergency room to not know exactly what is going on with their child. So for me this was like an inspiring use case for the work we put in, you Tim, Gwen, Ben and the rest of our team to make Kafka really good because it's used for more things than just making you click on advertising.

Tim Berglund (26:00):
Exactly, exactly. And I mean, not to dump on advertisers there are valid ways of understanding that, but these are a little bit more effective and more immediately humanizing that you can tie to your experience, being in an ER, with your own kid. Or, in my case when I was a kid. The one being in the ER from hitting my head, instance and plus one. I did that a lot when I was a kid.

Gwen Shapira (26:33):
And that's how you turned out the way you did.

Tim Berglund (26:33):
It had to be said, Gwen. But yeah, this is meaningful stuff and these are the use cases that I think, as I said before, tend to be more likely to get us out of bed in the morning.

Tim Berglund (26:47):
Now, Michael, what hasn't turned out the way you thought it would, say where you thought things were going five years ago, what's different now?

Michael Noll (26:56):
Well I can tell you that 2020 didn't turn out the way I thought it would.

Tim Berglund (26:59):
That is a big yes.

Michael Noll (27:02):
So I would say in terms of technology or Kafka work, what would be an example? Maybe Gwen and Ben have a better answer. But I can say that I think the work on exactly-once semantics took us really a long time and much longer than we originally estimated. I think we ended up with quite a few design paths that we trotted on but eventually discarded but then thanks to the benefits of open source, other software projects can now benefit from the design that we came up with for Kafka. And thus far it has been working really well.

Tim Berglund (27:36):
Yeah, that's strange though, the thing taking longer than you thought, that's unusual in the history.

Gwen Shapira (27:42):
[crosstalk 00:27:42].

Ben Stopford (27:43):
No IT projects has even been [crosstalk 00:27:46].

Michael Noll (27:46):
But let me qualify that, that took really much longer than we originally estimated.

Tim Berglund (27:52):
Even longer, right, right. It's not an easy problem, Gwen you were talking before how that's a thing people used to think was impossible. And I'd even strengthen that, that's a thing people knew was impossible, right? It was this strong knowledge claim, and it still happens sometimes. "No, you can't do that," and they mean a closely related thing that there is a true claim to make there, but it took longer and it surprised a lot of people that it was even possible.

Ben Stopford (28:20):
Well it's also actually, probably the thing, going back from the previous questions is one the things that probably changed the most is, five years ago I think a lot of people didn't trust Kafka so much because it comes from this big data era where everyone was doing aggregation and if you lost a bit of data it didn't really matter that much.

Ben Stopford (28:36):
I actually remember when I first got here, I spent a whole bunch, probably a day going through the code, like writing tests, trying to make sure that the ordering guarantees actually works because I wasn't totally sure. I was obviously over in the UK, and Kafka, I have to say has been designed flawlessly, it's a really elegant system on the inside, and it provides those.

Ben Stopford (29:07):
I think, A, a lot of people don't necessarily trust that but also just being able to provide things like stream processing with accuracy, how can you build a system, any decent business system unless you have correctness properties of the infrastructure you're using. It seems a bit strange to me, so I think it's well worth the effort even though it did take a long time.

Gwen Shapira (29:30):
Yeah, I think for me the most unexpected thing that happened, is obviously the cloud. It's not that I didn't know about the cloud five years ago, I did. It's that I didn't appreciate how different architectures people will be to the cloud are, there are little things I would be doing totally different five years ago if I realized how people want to use their cloud systems. Probably this wasn't really a thing then, maybe it existed and it changed a lot on how people do networking, how they want to do service discovery, how they want to do their access. And just thinking that five years ago, we rewrote the network layer in Kafka but we kept the basic because our clients have to talk to a specific worker that has the leader of a partition that they care about.

Gwen Shapira (30:28):
And this is one of the most basic Kafka assumptions and it turns out if you run in cooperators that the opposite of ever assumption every cooperators designer ever took. They're like, "Oh you're running 50 copies of a service that are clearly all identical and the client will never care which ones they're talking to." And I think the work of making those two models mesh better, we kind of solved it, we have the operator, we have headless services but it's hard work for the people who want to use it. And I think Kafka could evolve in the direction that really acknowledges the way people started running their cloud network applications.

Michael Noll (31:11):
And Tim, maybe I can add something to what Gwen just said because Gwen, what you probably highlight is also, just the general impact of cloud on technology.

Gwen Shapira (31:20):
Yeah.

Michael Noll (31:21):
And one thing that I noticed and the reason why it's still fresh in my mind is because Ben and I are writing a blog post that should be published by the end of August on trade-offs in distributive systems design. So we're looking at Kafka and some other systems like it. And one thing that always stood out to me was that along the way, the developers of Kafka made some really smart design decisions, that are still applicable and useful in the cloud. No, give or take, right? There are some things where we will need to improve, Gwen alluded to some of them but overall the idea of the core reactors in Kafka, the way that the components interact with each other et cetera, that has really stood the test of time thus far. Even though we're seeing this dramatic shift in how infrastructure is being deployed and used in the cloud, nowadays.

Gwen Shapira (32:14):
You're right, that's what happens when you're in the trenches, you only see the stuff that doesn't work and you want to fix. Most of the things do work.

Tim Berglund (32:21):
Right, right. You don't see the tremendous legacy of success like Ben alluded to. Ben, if we opened the podcast with call to open quotes that Kafka's been designed flawlessly would definitely be the one we'd use for this episode. But there's so much that's been done right, and Gwen, especially in your position, you don't see that. You just see what's wrong and what's growing and what's painful.

Ben Stopford (32:44):
Actually, it's a bit on the side, but I'm going to dig anyway.

Tim Berglund (32:48):
Do it.

Ben Stopford (32:48):
I think somebody needs to write a book, so I challenge anyone who's out there listening, to write a book, on basically the internals of different systems and how they're designed. Particularly the open source ones, Kafka is the only one I know really well but actually you can learn a lot from the way these systems are designed and they're quaint, yeah they're so very elegant.

Gwen Shapira (33:12):
Didn't [Martin Clapham 00:33:13] write this book?

Ben Stopford (33:14):
Well, yeah I think of it more like a deep dive into the nice properties of each of these systems. So just to be a bit more concrete about that, Kafka's design, which is basically like two thread pools, like network threads, which do a synchronous IO, that design with the buffer between the two is very, very simple. It offloads in theory, the disc operations to a separate thread pool, it's actually very, very simple, it ensures ordering of messages, it has this purgatory structure which allows you to delay responses and also manage replication.

Ben Stopford (33:57):
And what you're actually getting in there is like a whole bunch of functionality that's tailored for this use case in this very, very simple conception model. I actually think it's really elegant. I said that before, didn't I?

Tim Berglund (34:09):
You did, but you should keep saying it.

Gwen Shapira (34:11):
Yeah, event books are great architectural pattern in general.

Michael Noll (34:16):
And I think also, there is another aspect to it as well. I recall I was in a Kafka meet-up and there was a company called Doodle, doodle.com. I'm not sure whether it's a thing in the US, it's essentially a simple web service where you can schedule a meeting with other people. Which is pretty useful particularly if it's across companies because you need to find meeting slots, et cetera, et cetera.

Michael Noll (34:38):
And it looks like on the surface, it's something that you can write in an afternoon, as part of how to learn programing exercises. And in that meeting, the lead engineer for that Doodle service, showed us the code tree of all the corner cases that can happen as you were trying to schedule a meeting with colleagues, friends, or customers et cetera, and you realize that, oh god, this is actually really complicated. You can't do this on a single day or in a week, or in a month, this takes you many, many months to get right.

Michael Noll (35:11):
And now imagine you're building a distributed system like Kafka, that does all these complicated things whether on-premise or in the cloud. So once you see that, and the corner cases that Gwen and her team are working and fighting with, this is not something that you can easily replicate with a new open source project that has been around for a year or two, and you expect the same quality or the same performance as Kafka has.

Tim Berglund (35:33):
Absolutely right, this reminds me of a Joel Spolsky essay from about 600 million years ago, if I recall correctly. But it was about rewriting, a complete rewrite is almost never the right thing to do because you've got all those solved problems and that knowledge built into. And he was just talking about business software not even infrastructure or distributive systems infrastructure but in our case, all those corner cases are literally encoded in the codebase and that knowledge takes a long time to accrue.

Gwen Shapira (36:09):
I don't know guys, pretty much every weekend they sit down and like, maybe this is a good time to rewrite Kafka.

Tim Berglund (36:14):
I know, I know. It is the temptation of all engineers when they are facing a codebase to do that, and it's probably wrong. In that vein, what do you three feel like you know now that you didn't know then? We're talking about lessons learned and what's a lesson or two that you've learned?

Michael Noll (36:36):
Ladies first, Gwen?

Tim Berglund (36:38):
Yes, I just made that a jumped ball.

Michael Noll (36:40):
Sorry if I put you on the spot, that wasn't my intent.

Gwen Shapira (36:43):
Five years is a huge amount of time and count blood I learned so many things, I think for me the big lessons were more personal than distributed systems-wise. Oh, I have a distributed system lesson, you really need a lot of EPoX when you start the designing a distributing system, you probably underestimate how important EPoX are and how many of them you need and how much you need in everywhere.

Gwen Shapira (37:14):
If you look at the journey of Kafka through the last five years, you can chart it as the gradual adding of EPoX to every single piece of the protocol I think Ben started the trend so he can talk more about how it got started, but now it's everywhere and we almost do it by instinct five years later.

Ben Stopford (37:36):
When that first KIP 101, I should remember this, can't remember.

Gwen Shapira (37:43):
Yes.

Ben Stopford (37:44):
That's how bad my memory is, but whichever KIP it was, all the bills miraculously went green, it was really exciting. All these [inaudible 00:37:57], I thought they were intermittent failures but it wasn't, it was the lack of EPoX so there you go, yeah I would agree.

Tim Berglund (38:07):
The Kafka community is in my opinion one of the most robust and kind and welcoming and otherwise healthy technology communities I've ever seen, I say that because this sounds like a negative question but what do you think the technology community broadly, even outside of Kafka, has missed or is undervaluing when it comes to Kafka and things in its orbit?

Gwen Shapira (38:34):
Testing.

Tim Berglund (38:35):
Testing, Gwen says. Engineering manager says testing and yeah it does have to be one answer. Ben? One-word answer, Ben? It can be more than one word.

Ben Stopford (38:44):
A one-word answer?

Tim Berglund (38:45):
It can be more than one word.

Ben Stopford (38:47):
Yeah. In terms of outside, I think I would put it there and most people [inaudible 00:38:53] a lot of the core value. I think there's long way for us to go so I put it the other way around, I think. I'd say that even though it's pretty popular, I still think we're only 20, 30% of the way there. So I think actually the onus is on us to make sure that the whole vision is completed. Everyone, in fact, not just us.

Tim Berglund (39:18):
The whole vision of events streaming is completed from a product and from an awareness perspective, do you mean?

Ben Stopford (39:24):
Oh actually also I meant more from the way that people implement the systems, if you think about it from most businesses today, the big ones are global, an event streaming platform should hopefully form some kind of central nervous system. Some of this is the heart of the company and allows you to move that around, allows you to be very agile about the way that you build new projects, make data available and high-throughput. And that's still quite hard to do, the cloud has been a big help with this but certain things that come from cloud definitely help but even building that today, you still need to put some pieces together yourself and really for this to be a shrink-wrapped product, all of these things need to be automated. It has to be very, very natural and you actually have to be able to integrate with lots of different systems in a way that's hands-off and for us I think the big challenge is, is on the cloud of making that easy.

Tim Berglund (40:32):
Great framing there, Michael, anything?

Michael Noll (40:35):
Well I'm confused now, Tim, shall I give you a Gwen one-word answer or a Ben one-word answer?

Tim Berglund (40:40):
Somewhere in between, if you could take the average of the two.

Michael Noll (40:44):
And I noticed Ben wasn't laughing.

Ben Stopford (40:47):
I was laughing, I'm not sure we understand the concept of one-word answer. [crosstalk 00:40:53].

Michael Noll (40:55):
To answer to your question, Tim. I would say-

Tim Berglund (40:57):
I had the developer advocate team on the podcast just last week and so you have no idea, don't worry about your wrong answers, it's fine.

Michael Noll (41:06):
So to give you a one-word answer, I would say for me it's two words in English, it's user experience and the slightly longer answer would be, I truly believe that to make any kind of technology successful, it must be a joy for people to use and put a smile on their face. And probably the best example I can give and I don't want to talk about an Apple product or whatever. For me the coolest, most well-designed product that I've ever seen thus far is actually a music instrument, and the music instrument is called OP-1 from Teenage Engineering.

Michael Noll (41:42):
It's a small Swedish company, I think they have 15 or 20 people and they made something that looks like an electronic keyboard but it's like an all-in-one digital audio workstation, you can do sampling on it for drum beats, for vocal recordings, you have your four-track recorder on it so it can literally build a song from start to finish on this device. It's battery powered so you can take it with you on a flight for example, it lasts for 16 hours or so and it's so good that it's actually featured in the New York Museum of Modern Art.

Michael Noll (42:15):
And I believe if we at Confluent, or people who are listening in, in their own companies can build something that reaches this level of mastery, you can really be proud of yourself because you made a lot of people very happy and helped a lot of people get their job done.

Tim Berglund (42:31):
With that, that's a tremendous vision. Last question, we're coming up against time here, where do you think the event streaming world will be, what do you think it will look like five years from now? You're looking back five years, you've got perspective that most human beings don't have, where do you think we're going to be?

Ben Stopford (42:52):
I'll give it a go, I think probably the thing that I'd like to think will change is that... I grew up and I think we probably all grew up in a world where you put data in a database and that was basically it, certainly that's how I grew up. I'd like to think that the current generation are going to grow up in an era where that's actually not necessarily true. When they think about data, they won't just think, I'm putting it in a database, they'll think, oh I need to move this around, and that's actually part of the way that you program systems, that's part of the way that you put together architectures.

Ben Stopford (43:35):
And for me in five years time, yeah if it's not already solidified, I think it will be and I think that transcends Kafka. That transcends the whole of event streaming thing, I think it's just a broader concept that today we're just more interested in linking data together, operating on it while it moves rather than just leaving it locked up inside some database which we access from using an RPC protocol.

Michael Noll (44:08):
Yeah, I'm with Ben on that. I would say that this move to faster data, real-time data is really prevalent. We see this not just in our space with Kafka, but also streaming as in Twitch streaming, there are celebrities now that you would have never guessed there would be any of them 10 years ago, that are now vastly popular, particularly with the younger generation. If there is a major E-Sports event, like Counterstrike or League of Legends, there are millions of people online to watch this event live. So it's not quite yet Superbowl audience but it's getting closer and closer.

Michael Noll (44:48):
And I recall that just recently, a few days ago, I think the Spotify CEO said something along the lines of an artist today is expected to release essentially every day, every week because you cannot afford to go into recess for three, four years to come up with your next album. And I think this need for more information or at least this perceived need to get everything readily available at your fingertips, this is something that is impacting us as a society and not just people in the Kafka event streaming tech space.

Gwen Shapira (45:17):
Yeah, I hope that in five years there will almost not be an event streaming space as a thing. If you look at it like when I started my career, design patterns were a huge thing and everyone was talking about design patterns, buying design patterns books and these days nobody talks about that, they just write code this way. Just like the good ones made their way into best practices and nobody really talks about design patterns as such, they just talk about nice clean code.

Gwen Shapira (45:50):
And I hope that in five years, event streaming, nobody will talk about event streaming that people will just write their applications that way and people will not have to have... These days, you really need to know Kafka well, people even asking for themselves before they start writing to Kafka. They really take time to learn it which is a good thing but you also think about all those people who write to databases with Cybernet and stuff and they don't know about databases, and I want it for Kafka.

Gwen Shapira (46:25):
In a way I almost want our users to become more ignorant, if that's an acceptable thing to say.

Tim Berglund (46:35):
My guests today have been Gwen Shapira, Ben Stopford, and Michael Noll. Gwen, Ben, Michael, thanks for being a part of Streaming Audio.

Ben Stopford (46:42):
Thanks Tim.

Gwen Shapira (46:43):
Thank you, Tim.

Michael Noll (46:44):
Thanks for having us.

Tim Berglund (46:45):
Hey, you know what you get for listening to the end? Some free Confluent Cloud, use the promo code 60PDCAST, that's 60PDCAST to get an additional $60 of free Confluent Cloud usage. Be sure to activate it by December 31st 2021 and use it within 90 days after activation. Any unused promo value on the expiration date will be forfeit. There are a limited number of codes available, so don't miss out.

Tim Berglund (47:14):
Anyway, as always, I hope this podcast was helpful to you. If you want to discuss it or ask a question, you can always reach out to me at tlberglund on Twitter, that's @tlberglund. Or you can leave a comment on a YouTube video or reach out in our community Slack. There's a Slack signup link in the show notes if you'd like to join. While you're at it, please subscribe to our You Tube channel and to this podcast wherever find podcasts are sold.

Tim Berglund (47:42):
If you subscribe through Apple podcast, be sure to leave us a review there. That helps other people discover us which we think is a good thing. Thanks for your support and we'll see you next time.