Streaming Audio: Apache Kafka® & Real-Time Data

Scaling Developer Productivity with Apache Kafka ft. Mohinish Shaikh

January 20, 2021 Confluent, original creators of Apache Kafka® Season 1 Episode 139
Streaming Audio: Apache Kafka® & Real-Time Data
Scaling Developer Productivity with Apache Kafka ft. Mohinish Shaikh
Show Notes Transcript

Confluent Cloud and Confluent Platform run efficiently largely because of the dedication of the Developer Productivity (DevProd) team, formerly known as the Tools team. Mohinish Shaikh (Software Engineer, Confluent) talks to Tim Berglund about how his team builds the software tooling and automation for the entire event streaming platform and ensures seamless delivery of several engineering processes across engineering and the rest of the org. With the right tools and the right data, developer productivity can understand the overall effectiveness of a development team and their ability to produce results.

The DevProd team helps engineering teams at Confluent ship code from commit to end customers actively using Apache Kafka®. This team proficiently understands a wide scope of polyglot applications and also the complexities of using a diverse technology stack on a regular basis to help solve business-critical problems for the engineering org. 

The team actively measures how each system interacts with one another and what programs are needed to properly run the code in various environments to help with the release of reliable artifacts for Confluent Cloud and Confluent Platform. An in-depth understanding of the entire framework and development workflow is essential for organizations to deliver software reliably, on time, and within their cost budget.

The DevProd team provides that second line of defense and reliability before the code is released to end customers. As the need for compliance increases and the event streaming platform continues to evolve, the DevProd team is in place to make sure that all of the final touches are completed. 

EPISODE LINKS

Tim Berglund:
When you build software, you build it with tools. Compilers, testing frameworks, Gradle, Maven [inaudible 00:00:17], but develop and efforts of even moderate complexity require custom tooling. We're increasingly calling this the discipline of developer productivity. Learn what it's all about from Mohinish Shaikh and learn what it has to do with Kafka on this episode of Streaming Audio, a podcast about Kafka, Confluent, and the Cloud.

Tim Berglund:
Hello and welcome to another episode of Streaming Audio. I am your host, Tim Berglund, and I'm joined today in the virtual, internet connected, live streaming, at least at the moment, studio by my coworker Mohinish Shaikh. Mohinish is an engineer on the developer productivity team at Confluent. Mohinish, welcome to the show.

Mohinish Shaikh:
Thanks, Tim. Thanks for having me on the show.

Tim Berglund:
Oh my pleasure. What is developer productivity? I'll just throw in a historical note for people listening who don't know. I know you guys used to be called the tools team and now you're called the developer productivity team, which I think sounds better. But what is developer productivity? What do you mean by that?

Mohinish Shaikh:
Yeah, I think that's an interesting thought. On the same note, I think it also in the past, I think, there are other names to this too as well. Like there's engineering services, engineering productivity, but really my definition of developer productivity is building systems and workloads that covers basically anything and everything between coming to production. And sometimes beyond because there's compliance, there is legal and other things. That's the simplest definition. But as simple as it sounds, it's I think a lot more complex and involved.

Tim Berglund:
So the software you write, does any of it get released to the public as open source or a part of a commercial product? Or do you create tooling to enable other engineers who are working on Confluent Cloud, and Apache Kafka, and things like that?

Mohinish Shaikh:
Yeah, it's the latter. What we really do is help the engineering teams to be more productive. What that means is to help their code fit to production so that they can focus on writing code. And we can help them with the flow of their code all the way to production and the customers. And yeah, I think it kind of covers a really broad area. I think coming to production itself, there's a lot of scope there.

Mohinish Shaikh:
There're comments involved. These are comments coming into source code management systems from teams. And not even engineering teams, sometimes the developer relations team, or maybe the solutions team, or other teams. We help them with putting systems like build, and release and deployments packaging, CI/CD. There's a whole bunch of aspect around infrastructure management and automation. And of course, I think as part of the same pipeline or workflow, we also need to build stuff to make sure the artifacts are shipped with quality and provide test automation infrastructure. On top of that, we also had to build systems to monitor and log and trace everything that's happening as part of this flow of code, keeping in mind with the security compliance and audit requirements. Yeah. I think that's the thing, a whole breadth of things that we generally would do.

Tim Berglund:
You said the teams you service, you mentioned developer relations. Those guys are the worst. I mean, they are just so demanding and hard to work with and they never say, thank you. And, Oh man, I can't stand those guys. And just so you know, listeners, if you couldn't figure it out, I am in fact, a member of the developer relations team. So yes, that was self-deprecating humor everybody. Talk to me about that last thing that was kind of interesting. You talked about audit trail type stuff for logging and monitoring what's going on with the build pipeline or with the CI/CD pipeline. What's that all about?

Mohinish Shaikh:
Yeah, so it's not one pipeline. So, as I said, coming to production, there's a lot of stages in between and the build itself is a stage, and then the release itself is a stage. So sometimes all of these different stages, we try to build cooling and workflows around these stages so that the code propagates from compilation builds, smoke test we're creating into artifacts. And from there, the artifact may get moved into a test pipeline where it gets tested, then it probably gets moved to a release pipeline or packaging pipeline and across both CI/CD. Plus I think, these are not single systems. We had a support, the stage environments, QA environments, integration environments, performance environments, and finally, the production environment. So each of these pipelines or stages are isolated by these environments.

Mohinish Shaikh:
And really, I think when there are multiple environments and so much complexity involved, we need to know what's happening in which environment, which pipelines, which workloads. And I think from two perspectives, one, is I think to know what's happening in our infrastructure as a DevProd infrastructure. And number two, if the code that is propagating is going to the right sanity checks across these pipelines so that we can validate on these artifact. So what that really means is we are actively engaged with all these various teams for the Org to help speed up the development. We established processes, we established operational guards, we engineered solutions, and we engineer services to support those solutions and run all of the stuff on our own infrastructure. Yeah. The monitoring, we need to monitor both our systems and also the code that's being flowing through our systems because there's a lot more complexity. And then, and then we need to cover for that.

Tim Berglund:
Which makes sense. Besides the monitoring of those pipelines, what are some of the tools that you guys are responsible for creating? And I'd like to dig into some of those a little bit.

Mohinish Shaikh:
On your question around, I think open source, do we contribute some stuff to open source or not? Sometimes we do. And this is mostly around tooling. For example, let's say if we build [inaudible 00:07:09] dependency graph of the artifact that's flowing through our pipeline, then that's a very common problem across the industry. And then maybe there're use cases that other teams other companies can also use. I think Vagrant is one such example, the founder of HashiCorp was able to push the Vagrant tool where it got adopted even outside their company at the point when he was working.

Mohinish Shaikh:
But I think, for the most part, we build stuff for internal use cases or internal business requirements. And we ended up using a lot of open source components at times, but I think we also write a lot of proprietary code that acts as a wrapper on top of these open source of the shelf components. To build these systems and solutions, to solve this business use case. So in general, I think what we build is common tools, common systems, common frameworks, common infrastructure to establish common standards, architectural patterns, automation, workloads, and even workloads as part of our tooling. So that in all of the code that these engineering teams are producing, they code through this common standardized workload. So each of the artifacts gets the same treatment. And that's how we guarantee the reliability and the quality of the artifact once it comes out of the pipeline.

Tim Berglund:
You gave a talk couple of years ago, I think at the Kafka Bay Area Meetup on some of the work that you do. And it was also about microservices and Kafka, scaling developer productivity. Could you kind of walk us through that? What was the basic message of that talk, give us a small version of it. We're all here.

Mohinish Shaikh:
Yeah. You're right. I think the basis of the talk was the DevProd. The scope of DevProd complex itself. And as part of that, I mean, compared to, let's say a backend engineering team, for example, where they probably are using maybe Java or a limited set of toolset to do their job. The scope of DevProd is so broad that there are all kinds of polygon applications involved. Python to Java to Maven to Gradle to Jenkins, for example, some teams use some other CIs, and I think there's a whole bunch of complexity there. And then Docker, I guess, also throws into whole another level of complexity, [inaudible 00:10:06]. So I think the point being the landscape of DevProd is super complex from the technology stack perspective, because each problem space has its own toolset.

Mohinish Shaikh:
For example, let's say we talk about CI then the Jenkins Hill wall, and to run workloads on Jenkins, we use Docker and that means that we need Docker images for these [inaudible 00:10:34]. These Docker images are now wrapped up with, let's say the build logic or the test logic that needs to be run to do some of those either build functional or test function, to test the code. Supply being the landscape is super complex. And sometimes these systems needs to interact with each other.

Mohinish Shaikh:
For example, let's say the build pipeline needs to pass on the artifact to the release pipeline, or it could be something else, let's say the monitoring system, for example, right? You're trying to trace this artifact from, let's say, commit to production that goes through all these environments and all these stages. So that's another problem. So this is where comes to the second point. This is where Kafka comes in and helps with some of these inter environment and interstage and inter workflow, even... Sorry, message passing between the systems and really makes it the traceability aspect of it, or the flow of information aspect of it. Make it readily available for us to go and probe and then make use of that data and gather insights.

Tim Berglund:
So it's a kind of application development. I mean, what you were describing there with a number of services that have events and state that they need to share with each other kind of just sounds like application development. So what you're calling developer productivity, I mean, there's a lot of individual things that you have mentioned that you do. It's a kind of a service team that in some ways has to build a bunch of small tools to support an organization, but the big stuff and the kind of strategic end of things. You're building an application and you just described the same application architecture problem that everybody else in the world is trying to solve. Not just, "Oh, here's a bunch of build tools and we'll package up Docker images for you."

Mohinish Shaikh:
You're right. And this was I think the third point as part of my talk. We do this as part of building smaller microservices, because really these tools are smaller services running within their own isolation. For example, let's say we just did a whole bunch of branch cuts across a whole bunch of repositories because we were trying to make a release out. So I think when that pipeline kicks in, all these services gets launched as part of the pipeline or the workflow. And in each of these smaller sets of tools that are run as services, then send data, both I think, informational data, transactional data. All that could be just a simple interservice message, communication type of data. So, which is really, I think even it's going out into Kafka rather than you can go into some database.
Mohinish Shaikh:
And then I think having to pull the state from full the state from the database. So you're right, more or less, it's almost an application architecture. Sometimes this could make an unlikely application architect, sometimes a well-defined tech stack. For example, if you're trying to run, I don't know, let's say, a login application, for example. Then I think your choice of tools is maybe you're going to write a service in Java or maybe go and maybe through in Docker. So your tech stack is kind of limited, but I think the DevProd on the tech stack is very super diverse because each stage comes with a unique set of problems. For example, let's say, we want to provision a whole bunch of build nodes or maybe test infrastructure. Then really the tool there is Terraform. Sometimes we use that, or maybe it could be CloudFormation, but the tool that goes on top of this as a wrapper to initiate some of these underlying open source components. They get to run on services and then they kind of pass on the messages.

Tim Berglund:
Got it. I was about to say, they're behaving as microservices. They're not just behaving as microservices, they are microservices. And so, the Kafka angle here then becomes the event logging and event streaming backbone through which those microservices contribute or communicate.

Mohinish Shaikh:
You're absolutely right.

Tim Berglund:
Wow. Okay. I had no idea. I did not know that that was a thing that developer productivity did. Now, what do you think? And this is all super cool, but people listening work for all kinds of different companies, some of our coworkers listen, but most of the vast majority of the listeners who use Streaming Audio do not work at Confluent, which is a good thing. And I wonder, what is special about what we do? In fact, no. Mohinish, let me put that question in a different way. Is there anything special about what we do that makes the developer productivity function so much more important?

Tim Berglund:
And what we do, I'll just kind of define it, is we create a data infrastructure product called Confluent Platform. There are people who work at confluent who contribute to Apache Kafka, but we don't cut those releases. That's an Apache, the project does that and the PMC makes those decisions. As Confluent, we have some committers lying around who do that work, and they're very smart and they do very good work, but we Confluent Platform. We create Docker images of pieces of Confluent Platform, and we create Confluent Cloud. So a cloud service that is a data infrastructure service, an on-premise product that is a data infrastructure product. Is there anything about us and our product and service that makes you more necessary? I should tell you what my follow-up question is going to be, is help other kinds of teams think about how this function ought to work. If you don't release a product like this, but you're just an internal IT department, what role does developer productivity take in those kinds of worlds? So, I mean, talk about us first, but then help everybody else understand why they need you.

Mohinish Shaikh:
Yes. I think the why for us, DevProd team is critical here is because of the fact that number one, I think we want to make engineering teams a lot more productive. If the DevProd function did not exist, then each engineering team needs to spend time on some of these... How do I say? Post coding problems basis, I guess? Right. I mean, sure. I think the engineers write code, which is great, and they do a great job, but someone needs to make that code shippable and also deployable. And that's the landscape we're talking about. Right? So if we don't have a function, then the engineers typically tend to do... How to do some of these by themselves, which was great. And in a small Org, or maybe I don't know, 10 to maybe 20, or maybe sometimes 50, just because the teams are small there.

Mohinish Shaikh:
The product itself is small. So the build might take just a couple of minutes and the deployment might take just a... I guess I don't know. Again, a couple of minutes, it could be an hour.

Tim Berglund:
What a nice world that would be, right?

Mohinish Shaikh:
Yeah, which will be super easy. But as we scale and as we grow, these things tend to take longer just because of the whole bunch of complexity around. I think, product expansions, line of businesses, complimentary service, going aftermarkets, having partners, community. Again, I think there is a whole bunch of requirements around compliance laws, geographies. I mean, we all know GDPR, languages and cultures too. Right? Sure, I think a small team thinks the team can do it by themselves. But I think as a team becomes [inaudible 00:19:36], I think 300 engineers, if I'm not wrong.

Mohinish Shaikh:
So having all those engineers work on the same problem is not great, but that's where I think this function comes in. And really, what they take away is everything that they don't need to worry about. And this is where I always tried to code, we do everything from coming to production. So really, the point here is the engineers can focus on writing code and we make sure that their code is shippable and deployable, again, with speed. So I think there's a code here. I try to bring up sometimes it's, for example, if the sales organization promises stuff to customers, they are proud, helps the organization deliver with speed. And that's really the gist of it. And I can tell you how that relates to us, for example, Confluent has its own Apache Kafka distribution. So what that means is I think we have to have processes and systems and tooling in place to kind of mirror and then sing the code from the Apache distribution, apply our whole bunch of IP and practice around it.

Mohinish Shaikh:
And I think pulled together all other products around the Confluent Platform ecosystem, for example, Kafka Connect, Kafka Streams, that kind of gets sometimes bundled with the Confluent Platform itself. And package them together, build them together and officially make a release, which is the thing we are... We just recently did a 6.0.0 release, which was a great release. So that means, I think building systems that can pull all this stuff together, build it, I mean, run quality checks, testing. Then I think there's a complexity around supporting these artifacts across different environments. So that I think there is compatibility testing that kicks in, we do run a whole bunch of system tests, which takes hours. The build itself takes hours and the packaging release protocol itself takes hours. We also produce, I guess, Docker images for different platforms.

Mohinish Shaikh:
We went to Red Hat UBI, you name it, and then having those go through their own set of testing processes, plus a thing complied to FedRAMP requirements and other stuff. I think PCI, for example, and because I think we don't want to ship stuff with, for example, Docker images with password [inaudible 00:22:27] in there. There's a whole bunch of quality checks and bounds that need to be enforced and that's where I think we were tooling. And this is where I think this function is essential. This also applies to other Orgs too. And why, I think some of the companies have this function.

Tim Berglund:
So if you were not a vendor, we're a vendor, right? For some reason that that word always seems like a dirty word, but it's not dirty and it's true. We make a product that people buy. And so, you were a vendor and this all makes sense, the packaging the Docker images, which by the way, is a thing that... I mean, we worked together and I have such a limited view of what your team does. I think of the Docs Build System and Docker images. Those are the two things that come to mind for me. So I'm learning things today too. But if you weren't a vendor and you were an IT organization for some other company that did some other thing, a retailer or a manufacturer, a building materials' company, or something like that. If you were suddenly transported it to that ultimate alternate reality where you worked for that company in the IT department, how would you rethink your work and apply it there? What kinds of things would you do at a non vendor, non product company?

Mohinish Shaikh:
Yeah. I think it's not an IT function, but I think they generally help with user onboarding and management of corporate users. But we're I think a really cool engineering team on the internal side. I think some of that you already have experienced. We are kind of a black box for all the other teams because we build tools and systems, and as I said, once you come at it, then you just forget it. You just give us requirements and then we'll make it happen. That's kind of the contract, right? So you don't have to worry about testing. You don't have to worry about building. You don't have to worry about deployments. You don't have to worry about shipping. All that goes through these systems that we bill, which includes tools, processes, workflows, pipelines, and environments.

Mohinish Shaikh:
Sure. I think if things fail, the teams get notified through I think, either Slack channels or incident response type of channels. And obviously, let's say, for example, the test case fails or the build fails, and that's the only place I think we want the engineers to come and fix their stuff. So what we really do is enable these quick feedback loops across the whole pipeline. So they get real-time, almost close to real-time updates about how their code is getting propagated. And if some check fails, then they get notified. Then the teams has to act on it and make a fix, make an update. And re-push the code. And the whole pipeline gets started again from build to packaging, to release, to Docker images. And then, basically, make it shippable with quality, reliability, and be able to deploy it.

Tim Berglund:
Yeah. Well, it was a year ago now. First made the argument publicly a year and a little bit about how the software is eating the world. Nordstrom is not so much about companies building more and more productivity applications to help workers do their work. But companies are becoming software. They're not just using more software, but they're turning into the software. And so, even companies that aren't tech startups and aren't infrastructure vendors like we are, we're the purest kind of software company there is. We make software that other people use to make software. But even if you're... Say, I'm thinking of, Oh, they used to be called Lafarge and they were acquired by another French company. They're a building materials company, concrete, and aggregate and things like that. So you sell extremely traditional thing like that, where you sell rock, basically. You're kind of also a technology company.

Tim Berglund:
I know this because my father-in-law recently retired from IT there, and we always talk about the kinds of applications his teams were developing. And so, they're a software company, even though they sell pieces of broken-up rock and sand, that seems like the lowest-tech thing in the world and they're a software company. So you still have a need like this, whether you release any sort of technology product, you do release technology products internally, and you probably expose services and APIs to your partners and vendors and customers on the outside, anyway. So, this seems like it's becoming a more and more important function.

Mohinish Shaikh:
Yeah, I think you're right. I think it's already a very critical function in technology companies. We talked about Nordstrom and let's say, Macy's. I mean, their main business is selling merchandise, for example, but then internally, they do have engineering teams that help manage and maintain their... Let's say the macys.com or the nordstrom.com, which is a deployable web application itself. And I believe, I think they also have teams working on different types of their web application modules. And you're right. I think if there is complexity primarily around shipping modules or artifacts, and also deploying, there is a need for Dell Pro function for. I think we talked about the companies where their primary business is just maybe in a relatively, not low non-tech.

Mohinish Shaikh:
In any of this case, I think the stones and sand, for example, and maybe they are using a third-party vendor for maybe enabling their sales. For example, let's say they using Shopify, which is very easy to set up and there's no internal tooling builder needed, but I think maybe they have an IT department just because the function of IT department is to secure the corporate users and enable this. Generally, that's their primary responsibility to help with corporate user onboarding. And then also, secure the parameters, establishing VPNs and I think devices like FIS and network accident, things like that. Sometimes they end up contributing internal tooling. And I think there could be a function there within that as well, where similar DevProd tooling might be needed. I mean, if there's engineering involved at some scale or a decent scale, there is a need for DevProd.

Tim Berglund:
Some of what you're describing sounds like DevOps. And I want to be careful with this question because it sounds like I'm asking, are you the DevOps team? And obviously, there's no such thing as DevOps team, but how is your work? Why is it not DevOps? What is different about the way you think about that word? Again, I realize, get five engineers together and you want to get them to fight, ask them what DevOps means. Because there'll be six different definitions and they'll punch each other after a minute or two. But why is your work not DevOps is just the question.

Mohinish Shaikh:
That's actually a very interesting question. And I'm sure they're all kinds of people who'll be ready to fight if you bring up that topic. I go with my definition, I think. I mean, from what I understand or based on my experience, I see them as infrastructure-related tooling type of development. This is where a DevOps team can be responsible for... Mostly, let's say deployment because this is where they need to go update a whole bunch of servers because they need to deploy something, maybe change a bit, or maybe change a file, update a file. These are also the teams that kind of needs to sometimes provision these instances. And I think this is where I was trying to say this landscape is super complex.

Mohinish Shaikh:
Sometimes I think people try to have a thin line of differentiation between DevProd and DevOps. But in my definition or my experience, number one, I think the landscape is super complex from coming to production, and some of these functions like build engineering, release engineering, packeting engineering, Docker image building engineering. Then comes a thing, DevOps, SRE is all part of the same landscape. For example, SREs are more responsible for production operations. So if somethings fail in production, then the tooling or workflows or systems, we have built there as part of the DevProd landscape, will get notified. Sometimes these teams themselves manage this and manage and build these systems for themselves. So I think the scope is kind of shared across these roles here. But I think in simple terms, for me, it really means anything between coming to production and some of these functions is just a sub-function within the larger function.

Tim Berglund:
My guest today has been Mohinish Shaikh. Mohinish, thanks for being a part of Streaming Audio.

Mohinish Shaikh:
Thanks Tim. Thanks for having me.

Tim Berglund:
Hey, you know what you get for listening to the end? Some free Confluent Cloud. Use the promo code 60PDCAST. That's 6-0-P-D-C-A-S-T to get an additional $60 of free Confluent Cloud usage. Be sure to activate it by December 31 2021 and use it within 90 days after activation. And any unused promo value on the expiration date will be forfeit and there are a limited number of codes available so don't miss out. Anyway, as always, I hope this podcast was helpful to you. If you want to discuss it or ask a question, you can always reach out to me @tlberglund on Twitter. That's T-L-B-E-R-G-L-U-N-D. Or you can leave a comment on a YouTube video or reach out in our community slack. There's a slack signup link in the show notes if you'd like to join. And while you're at it, please subscribe to our YouTube channel and to this podcast wherever find podcasts are sold. And if you subscribe to Apple podcasts, be sure to leave us a review there. that helps other people discover us which we think is a good thing. So thanks for your support and we'll see you next time.