Real-Time Machine Learning and Smarter AI with Data Streaming Artwork

Confluent Developer ft. Tim Berglund, Adi Polak & Viktor Gamov

Hi, we’re Tim Berglund, Adi Polak, and Viktor Gamov and we’re excited to bring you the Confluent Developer podcast (formerly “Streaming Audio.”) Our hand-crafted weekly episodes feature in-depth interviews with our community of software developers (actual human beings - not AI) talking about some of the most interesting challenges they’ve faced in their careers. We aim to explore the conditions that gave rise to each person’s technical hurdles, as well as how their experiences transformed their understanding and approach to building systems.

Whether you’re a seasoned open source data streaming engineer, or just someone who’s interested in learning more about Apache Kafka®, Apache Flink® and real-time data, we hope you’ll appreciate the stories, the discussion, and our effort to bring you a high-quality show worth your time.

All Episodes

Confluent Developer ft. Tim Berglund, Adi Polak & Viktor Gamov

Real-Time Machine Learning and Smarter AI with Data Streaming

January 05, 2023 • Confluent, founded by the original creators of Apache Kafka® • Season 1 • Episode 251

0:00 | 38:56

Are bad customer experiences really just data integration problems? Can real-time data streaming and machine learning be democratized in order to deliver a better customer experience? Airy, an open-source data-streaming platform, uses Apache Kafka® to help business teams deliver better results to their customers. In this episode, Airy CEO and co-founder Steffen Hoellinger explains how his company is expanding the reach of stream-processing tools and ideas beyond the world of programmers.

Airy originally built Conversational AI (chatbot) software and other customer support products for companies to engage with their customers in conversational interfaces. Asynchronous messaging created a large amount of traffic, so the company adopted Kafka to ingest and process all messages & events in real time.

In 2020, the co-founders decided to open source the technology, positioning Airy as an open source app framework for conversational teams at large enterprises to ingest and process conversational and customer data in real time. The decision was rooted in their belief that all bad customer experiences are really data integration problems, especially at large enterprises where data often is siloed and not accessible to machine learning models and human agents in real time.

(Who hasn’t had the experience of entering customer data into an automated system, only to have the same data requested eventually by a human agent?)

Airy is making data streaming universally accessible by supplying its clients with real-time data and offering integrations with standard business software. For engineering teams, Airy can reduce development time and increase the robustness of solutions they build.

Data is now the cornerstone of most successful businesses, and real-time use cases are becoming more and more important. Open-source app frameworks like Airy are poised to drive massive adoption of event streaming over the years to come, across companies of all sizes, and maybe, eventually, down to consumers.

EPISODE LINKS

SEASON 2
Hosted by Tim Berglund, Adi Polak and Viktor Gamov
Produced and Edited by Noelle Gallagher, Peter Furia and Nurie Mohamed
Music by Coastal Kites
Artwork by Phil Vo

🎧 Subscribe to Confluent Developer wherever you listen to podcasts.
▶️ Subscribe on YouTube, and hit the 🔔 to catch new episodes.
👍 If you enjoyed this, please leave us a rating.
🎧 Confluent also has a podcast for tech leaders: "Life Is But A Stream" hosted by our friend, Joseph Morais.

Kris Jenkins (00:00):

Hello, you are listening to Streaming Audio, and I think the word that sums up today's episode is usability. How do we take all these wonderful tools we've been building, like Apache Kafka, and Kafka Streams, and Kafka Connect and things like that, and then make them usable by as many people as possible? Because I'm a programmer, and as a programmer it's starting to feel like stream processing is going mainstream, but is it destined to stay mainstream just for programmers? Or can we build tools that open these ideas up to business analysts, data scientists, marketing people, maybe an AI model training team? Will those kinds of people be forever beholden to the programming sprint backlog? Or can we give them tools that can help them serve themselves? Well, my guest today certainly thinks we can do that. I'm joined by Steffen Hoellinger, who's co-founded a company called Airy.

Kris Jenkins (01:01):

And they started off building things like chat bots for better customer service/ And they realized along the way that to build a better chat bot, you need to give it better data about the context of that chat, ideally in real time. So, you need to make pulling in and joining different data sets easy for the people that want to build their chat bots. And so, they're gradually expanding from making it easy to build bots, to making it easy to build general stream processing pipelines, ideally without too much coding. We talk about that journey, and we talk a lot about their ultimate goal, which is to make the final experience for your customers better. So, this episode is brought to you by Confluent Developer, and more about that at the end. But for now, I'm your host, Kris Jenkins. This is Streaming Audio. Let's get into it.

Kris Jenkins (02:00):

And joining me today is Steffen Hoellinger. Steffen, how are you doing out there?

Steffen Hoellinger (02:04):

Great, thank you, Kris. Pleasure to be on the show.

Kris Jenkins (02:07):

Good to have you. Are you over in Germany?

Steffen Hoellinger (02:10):

Yes, I'm at the moment in Berlin, Germany, and it's freezing cold outside.

Kris Jenkins (02:14):

Yeah, it's probably a lot worse than England, and England isn't that warm today. But we didn't come to talk about the weather. You are the CEO and one of the co-founders of Airy. And we're going to get into what that is in a minute, but I always wonder, because I read your bio and you co-founded this with two friends. And I always wonder in that situation, where were you in life when you thought, "Here's an idea worth building a company around"?

Steffen Hoellinger (02:42):

Yeah, I think it's quite interesting because me and my co-founders, we were always passionate about customer experiences, and how you can actually design those. And we were ultimately frustrated with how bad these still were despite the technology being available. So, we kept discussing how we can actually improve those and kind of put technology to use in a better sense to really create groundbreaking experiences for customers being customers ourselves. So, I think this was the starting point when we kind of realized we want to go into this, especially with machine learning and all these exciting things becoming more and more accessible, we really felt now is the time to get started.

Kris Jenkins (03:27):

So, you were sitting there thinking customer experiences are important, they kind of suck, and machine learning's fun, we could put them together somehow? Or was it more crystallized than that?

Steffen Hoellinger (03:40):

It was definitely a more crystallized journey over a couple of years discussing back and forth. So, I was involved in a company called Delivery Hero. My co-founder was one of the people that started Groupon in the UK, and I think we were both passionate about the subject and were really, being customers ourselves, frustrated with airlines and other companies that you had to call on the phone, wait in line forever. And then ultimately you're being connected from one person to the other, you are keep telling your story over and over again, and you think, "Why is the company not able to bring the data that they already have on me together in a more meaningful way so that I can have a more joyful experience as a customer, and the company can save money at the same time?"

Kris Jenkins (04:27):

Yeah, I think everyone's had that experience where you enter your account number via the automated dialing system, and then someone finally picks up the phone and says, "What's your account number?" That's the classic-

Steffen Hoellinger (04:37):

Absolutely, absolutely. And in the end, they have you read your credit card number allowed on the phone, which is not only insecure but I mean, a stupid task for a human to type that in over the phone.

Kris Jenkins (04:48):

Yeah. So, from that universal experience of customer service sucking, which we've all had, you went into two of my favorite topics, real-time data and machine learning. So, reveal. What is the business? What does Airy actually do with those two great topics?

Steffen Hoellinger (05:07):

Yeah. Airy is an open source app framework for data streaming. So, what we do is we ingest data and we stream that on top of Apache Kafka, and then we enable you to build microservices around that or to plug in standardized connectors so that it talks to your business systems. So, it can pull in data and it all also can send data to other systems, and also trigger workflows inside the organization.

Kris Jenkins (05:38):

Right. But so far that sounds like Kafka, that sounds like the pitch for Kafka. So, what are you adding on top of that?

Steffen Hoellinger (05:47):

I think we're adding a layer of standardization that ultimately enables people to get started much faster. So, we want to make it more accessible, not only to let's say the people that studied computer science and have 10 years of experience in data streaming applications, plus coding and Java, and all of these kind of things. But we ultimately want to make it more accessible because we believe it's a groundbreaking technology and it should actually make its way to the hands of more people. And this is what we're trying to do with Airy.

Kris Jenkins (06:19):

Right. Yeah. So, where does ML come into that?

Steffen Hoellinger (06:24):

ML comes in because we started actually out on the conversational AI and conversational experience side of things. So, when we started Airy, we kind of were playing around with chat bots, obviously was the high season before everybody went into crypto and now I think they're leaving again. So, they might come back to AI and machine learning topics. But yeah, I think this was the time when we started Airy, and we never really believed to automate too much too soon, but to rather enable a human assisted experience with the help of machine learning so that basically in the end it's still a human that has to decide most of the things, like are you entitled to get refund? Can I rebook you on this flight? Because the systems are not perfectly integrated, especially in a larger enterprise, it's sometimes difficult to even access the data that is sitting in some silos, or there's not even an API to automate some of these things.

Steffen Hoellinger (07:21):

So, you kind of need the human in the middle. And so, we always were trying to integrate with existing systems that the company's already using for contact center software, for example, or help desk software like Zendesk. So, if the company was already using that, we try to leverage chat bots plus the existing software stack of the company. And in that regard we built out standard integrations with these kind of, both conversational AI systems, there are a bunch available from all the big cloud companies, obviously. Also, some independent ones, some of them are open sourced. So, we offer integrations with all of them, plus we integrate with contact center software and business systems, ERP systems, CRM, et cetera.

Kris Jenkins (08:14):

Right. So, I'm not entirely clear yet. What's the different customer experience that someone's having now?

Steffen Hoellinger (08:23):

Yeah, I think it boils ultimately down to the fact that in conversational AI historically, because all of these systems were built in Python, let's say the whole topic of joining data comes more as an afterthought. So, it's really the case that often these systems try to, when you have an incoming message, for example, somebody's reaching out to you as a company over WhatsApp, or via Instagram, or Messenger, or on Google Maps for example. You have an incoming request, and then you do intent recognition first. And then after you recognize the intent with a certain confidence level, you need to kick off so-called action. And then you send out one or a several API calls and you reintegrate that. Often that is not leading to the designated results, plus we have a problem even reintegrating the API responses.

Steffen Hoellinger (09:21):

So, what we try to bring to the table is we try to enrich the context of an incoming event already before it hits the ML model. In this case, we're talking often about large language models that are used for the purpose of intent recognition. So, we try to build in dynamic features into the model that enable it to understand the context, for example, in which the customers reaching out to a specific company much better. So, in that regard, joining the data already on the level of the data streaming system before it hits to conversational AI model.

Kris Jenkins (10:03):

So, I mean, I've had this experience with chat bots on websites. You say, "Why hasn't my order 5678 shown up?" And all you get back is like an FAQ search index pretending to be a human being, and it kind of sucks. And it's you are saying it's because they haven't wired their chat bot into the order system that could potentially recognize the order ID I've just told them about.

Steffen Hoellinger (10:30):

Absolutely. So, it's often not even obvious which order system you mean. Especially within a large company, they have several systems side by side. So, if you don't pull in the data first who you even are as a customer, the system will have a hard time even finding you inside one of the silos.

Kris Jenkins (10:51):

Yeah, because that's a classic thing with customer support people, they often need access, just when they're human beings, they need access to a lot of different systems to actually answer your question.

Steffen Hoellinger (11:01):

Exactly. And this is for example, what we do as well. So, when the confidence level of the intent, for example, is too low we can also suggest the response that we believe is the right response to the agent. And then the agent acts almost like a trader. So, we can have a feedback loop there that is increasing our confidence over time, kind of retraining the model while the customer support people kind of select the right answer. But as you said, yes, most of these models at the moment they are not personalized at all. So, they're automated, yes, but they're just automated FAQ models, and as soon as you ask about a specific order they basically cannot help you anymore and will always try to connect you with a human being that can help you. So, you're back in square one.

Kris Jenkins (11:49):

Yeah. They do kind of suck. They're just often, like I say, the search index pretending to be a human being. So, take me through the actual event because the structure of this must be interesting. You've got a mixture of realtime data coming through, kind of semi realtime joining to data silo databases stuff. You've presumably got a big batch job training the machine learning models in there too. That's lots of fun data stuff. So, give me the architecture diagram in my head.

Steffen Hoellinger (12:27):

Yeah, so what you actually get with Airy is kind of a layer running on Kubernetes on the side of Kafka. So, if you use the open source version, you get an embedded Kafka cluster inside the Kubernetes cluster, or you can also use it and connect it to an existing Kafka cluster that you are already using inside your organization. For example, the ones that Confluent is offering. So, this is supported as well-

Kris Jenkins (12:58):

What an excellent choice. I shouldn't be saying that, sorry.

Steffen Hoellinger (13:04):

Yes, it makes a lot of sense given that, I mean, some of the largest banks, insurance companies, automotive companies, telco companies, they already are customers of Confluent's. So, it makes total sense to allow them to use their existing infrastructure also from a data perspective, because we want to integrate with that data that they're already streaming. Plus we actually want to enable their teams to go to production much faster, because we offer them a bunch of components inside the system. And this means both on the UI side but also on the site to enable engineers, for example, to kind of build out prototypes and bring them to production much, much faster and in a more robust way. This is the reason why we position us as an app framework. And in that regard, we have several components that come with Airy, plus a variety of connectors that you can put in, and we offer SDKs to write your own connectors.

Steffen Hoellinger (14:04):

So, in case we don't offer a connector yet, we actually enable people to write their own connectors. And our connectors in that sense are a bit different than, for example, the source and sync connectors that Confluent is offering. But because everything runs on Kafka, it actually can be combined with those. So, for example, when we want to ride to a data lake or a data warehouse, we would not ride a new connector, but we would basically just enable people to use an open source connector or just install the Confluent connector. But when we're ingesting data, for example, from a conversational source like WhatsApp, there is currently no connector for that in the Confluent ecosystem. So, we allow people kind of to subscribe to, in this case the webbook that is sending events, and we are ingesting those events into the cluster and then distributing them obviously over the amount of notes you need.

Steffen Hoellinger (15:06):

Plus we then enable different services to consume and produce through the stream, so to speak, for example, to join events to kind of enrich the incoming events, for example, via conversational AI integrations, and then ultimately trigger back a response so that there is an immediate reaction to the customer. But also, for example, to write all the data at the same time streaming that into a data lake so you can actually analyze that later. You can train your machine learning models and batch mode asynchronously on all the historic data. At the same time we believe there is a unique quality of realtime data, bringing that together. So, you not only have all the historic data to train your models, but you also make them more context aware and you enable them to react to things that happen in real time, which is ultimately important for some use cases. Not only for chatting with a customer, but also for making decisions and predictions on the fly.

Kris Jenkins (16:19):

Okay. So, the way this would work then is I start off with connecting to, I don't know, Postgres database of customer orders, and WhatsApp, and a naive initial model, ML model. I get a whole bunch of real life responses going out to another topic, and then I run that back over the ML model to try and improve it.

Steffen Hoellinger (16:43):

Ideally, yes. In fact, there's sometimes more to it and we have to actually jump in a lot and try to help customers. So, some customers already do, I think, a great job. They're having their own ML teams that actually use this to add features to their models in real time, which often is a big problem for them because effectively the ML engineers, they request a bunch of features and then the data engineers tell them, "Yeah, actually I can give you only a part of that." So, we try to enable also in this case ML engineers to kind of supply their models with real time data much easier, so they don't have to go and build a data pipeline every single time, but we can try to enable them doing it on their own without actually making a big request to the backend team or the data engineering team first. But ultimately I think you summarized it quite well.

Kris Jenkins (17:45):

Okay. I think I'm almost interested more in the places where it doesn't go so smoothly, because that's where we can really learn what the difference we're trying to make is in building these systems, right?

Steffen Hoellinger (17:59):

Yes.

Kris Jenkins (17:59):

So, without naming any names, tell me a customer that really struggled and you helped out.

Steffen Hoellinger (18:06):

Yeah, I think in the end it's always this thing that people have this experience in mind, maybe inspired by movies like Her, which we are big fans of.

Kris Jenkins (18:18):

Oh yeah, that's a great film.

Steffen Hoellinger (18:19):

And ultimately we want to go down that route and enable technology to stay in the background, and people basically can live their pleasant lives and in the background technology is solving all the issues for you with the help of machine learning. We're not totally there yet, I assume, and the whole industry is not. So, it's pretty exciting what is currently going on with all these new trends in terms of generative AI and GitHub CoPilot, and all of these other things in that aspect. But yeah, we're I would say in the earliest days of defining these experiences, and at the moment it's often human trying to achieve a certain result, like getting a refund, and then achieving that is pretty complicated.

Steffen Hoellinger (19:08):

Even if you have access to people that want to push this at companies, and a lot of resources that they want to dedicate to improving the experiences in practice, often it's hard to achieve that result. And often what we see is you struggle, I mean, even at the earliest stages where you need to make sense, for example, what location somebody is referring to. So, we're trying to basically help people getting the basics right without trying to jump to an automated solution too soon, because when you enable those kind of experiences you can already make a big difference. You might not be able to fully automate a request, but you can actually help the human agent that will eventually get the request anyways to have more context about what is the customer inquiring about. So for example, what we often do is we try to make sense of where is the customer coming from, what is the right context?

Steffen Hoellinger (20:15):

And then not only try to write that context towards the model, because the model might not be trained on that aspect, but the human agent is effectively able to take that into consideration. And this is what we're trying to solve while getting there step-by-step, and automating more and more over time.

Kris Jenkins (20:39):

So, does that mean you're doing things like, let me think of an example, your automated agent can't actually process a refund but it can tell that you are talking about a parcel not arriving at home. So, it mixes in with the order, the home address, and maybe some details about the parcel carrier that goes to that area. And if it can't automate the solution, it's automating, bringing the context together.

Steffen Hoellinger (21:07):

Yes. Or it can basically, let's say, you can aggregate in the stream, for example, the information that this customer that is inquiring at the moment to get a refund is ordering once a week, and the total purchase value in the last year was a few thousand dollars. So, you might want to treat that customer in a preferred way compared to other customers. So yeah, we for example would write a ticket into Zendesk for the agent to handle that. And actually in that ticket provide not only the information, what the customer's inquiring about, but also put in relevant metadata information that enables the agent ultimately decide if the customer is entitled to get a refund.

Kris Jenkins (21:52):

Okay. But are you making that decision, because including that data that feels to me just to join, but are you also making decision about which data you could include but don't? Are you cleaning up the context or are you just throwing everything at the customer service person?

Steffen Hoellinger (22:13):

I mean, cleaning it up is often a bit difficult because it depends how you actually get to the schema of the data. And often the schema is, let's say a bit wild, especially when you integrate with a bunch of services at the same time. So, we talked to one airline, I think they integrated 27 services in parallel, and the only way to resolve that was by having a human manually put in the tasks, because you could not possibly automate that anymore. So, what we're trying to do there is we're trying to keep the data as close as possible to the original event, and we extract metadata from that. So, that often works, but not always. So, in case we find the right metadata, we would always expose that metadata to the human agent, or to the model respectively. But yeah, let's say if you see new event types, there might be some manual work required to actually get that into the model.

Kris Jenkins (23:15):

Okay, okay. Yeah, so we're actually moving partly into another one of our favorite topics on Streaming Audio, which is data mesh, right? The idea that you've got all these data sets you need to clean up and get in a decent publishable productized format before you even join them. Do you find yourselves going into customers and becoming data mesh consultants as well?

Steffen Hoellinger (23:41):

I mean, ultimately we're not trying to because we try to enable outputs. But yeah, I think this is a big trend that is currently going on. And also, speaking of that is actually a really good topic in terms of the disposable character of data. So, in that regard, I think enabling people to, let's say, get access to data and ask specific questions, write it into a store, get their job done, and then ultimately just throw it away, I think this is really groundbreaking in that aspect. And what we believe in is that ultimately people should store events for much longer also in the stream, because the stream is actually the perfect, let's say component in that aspect, to kind of don't have to do reverse ETL and all of these kind of things afterwards because you wrote the data in a specific schema in your data warehouse, and then it sits there as it is.

Steffen Hoellinger (24:41):

But this is also something we had to learn the hard way, because in the earliest days when you start out you write ... We actually did versioning and we had to migrate a lot of the data whenever basically something changed in the schema. So, we ultimately stopped doing that at all. So, we tried to stay as close as possible to the original event and basically take that so that we can even travel in time. So, we can even retroactively, let's say, put a connector there and say, "Replay all the events from the last six to nine months," and then you will end up having ... Or you will ultimately see if the model is also holding true for that integration.

Kris Jenkins (25:27):

Yeah. So, you can use it as your master source of truth. Kafka is this kind of universal glue and event store.

Steffen Hoellinger (25:37):

I mean, you should not maybe tell that to some people because there are a lot of people out there that would say that's the wrong way to use it because ultimately Kafka's not a database, but we have some topics where we keep retaining events indefinitely at the moment. Because we believe that, especially in the communication context, everything can become relevant again, even if it's a year old.

Kris Jenkins (25:58):

Yeah. I think one day we have to have someone on Streaming Audio that we can debate is Kafka a database with, because that'll be a good fight and I have strong opinions of my own. And another thing we have to have someone discuss one day is why the default retention period is seven days, because I don't understand-

Steffen Hoellinger (26:20):

That's a big mystery.

Kris Jenkins (26:21):

... why it's not infinity and then you decide to throw away data if you want to later. Anyway. So, it's interesting, these whole repeating problems we see of getting different data sets cleaned up, connected together, stored in an interesting way, passed back out. I wonder what experience the developer has with your tool set, because I can see in my mind how that works with a combination. If I wanted to do something similar with Kafka Streams and a bunch of connectors, I can sort of see how that works. No, I can see how that works, but what's Airy like? What's the difference there?

Steffen Hoellinger (27:05):

Yeah. I mean, ultimately this is what we are doing. We're using Kafka Streams inside the Airy platform. We're just standardizing a lot of the codes so you actually save yourself quite a bit of time. But if you have a 80 people team that can actually write all these streaming applications all day long, of course. I mean, this is what a lot of large companies do at the moment, but there is also no alternative at this point. So, what you're getting with Airy is basically a boiler plate framework that can save you a lot of time. Not for all the use cases that you might have, but it depends on the use case. And we're trying to learn. We just spent, for example, now three times in Bentonville, Arkansas being part of the fuel accelerator program, which is sponsored by Walmart and the Walton family.

Steffen Hoellinger (27:57):

So, we were actually exposed to quite a lot of these problems that are inside of a big organization like Walmart where you have a lot of resources, a lot of engineers basically working with these connectors, and they're working with building streaming apps all day long, mostly in this case for supply chain use cases. But ultimately it boils down to the fact can you enable these teams to build a prototype faster, and then bring it to production also much faster? And I think for example, Walmart, they did a great job in building a realtime event stream for inventory control and supply chain, but it took them about two years to build the application and roll it out to 5,000 stores. Given how slow some of these large enterprises normally move, maybe that's already light speed, but what we're trying to bring to the table with Airy is actually that you can build this experiences much faster.

Steffen Hoellinger (28:55):

So, what it gives you, if you are let's say an experienced backend engineering team, you can actually build on top of it so you don't have to write all the code once more, but it gives you a nice framework within which you can actually build your streaming apps. And you can also mix, you can basically use existing connectors that we offer, for example, to ingest data from Salesforce, or from other sources, or to basically write the workflow components that we have inside the system to, let's say, create something, change something. So, this is a more active approach on application level as compared to, let's say, the Confluent connectors that are mostly about streaming the entirety of data in a specific source to another system. So, then maybe have some transformations and microservices in the middle. So, we're in that regard working on a different abstraction level.

Steffen Hoellinger (29:49):

So, it's more granular I think, if you want to get something done and something reactive in real time. So, this is, I think, quite complimentary to that aspect. And what you also get, for example, is you get a nice UI control center where you basically can see the status of all your components. We offer some UIs out of the box. So, we offer inbox components, for example, where whenever you need a conversational interface, both for let's say the employee working inside of a company that might need to decide about something. So, we believe conversational interfaces are really powerful in that aspect so we can always power one up. And you can actually plug your existing solutions also for monitoring purposes, Grafana, et cetera, you can plug that in into the system.

Steffen Hoellinger (30:45):

So, we offer also standardized ways of integrating there. So, it generally gives you, I would say an advantage, to come to production much faster to save some resources, because let's say when you're just a small team, not 80 people, but you only have one or two people, you might actually be much faster where you want to go, plus you ultimately end up making some mistake. Not making some mistakes that we made. So, we tried to make the system also quite robust in that sense.

Kris Jenkins (31:18):

So, could you see a team using a mixture of this to get a fast prototype up and running, and then mixing in maybe some very custom Kafka stream stuff to do specific difficult process?

Steffen Hoellinger (31:31):

Absolutely. I think this is what we ultimately set out and about to enable these kind of experiences for developers. But apart from that, I think we also have people in mind that, let's say they might not be streaming experts, but they have a use case where streaming would make a big difference. So, I already mentioned machine learning engineers where we believe that realtime features are often missing components in the models that they train. So, this is a group that we are caring about, but also let's say looking at business teams. So, some of the mid-market companies that we work with, they don't have a lot of streaming experience, they don't have a lot of engineering resources available. So, for them, it's always a trade-off. Do they basically invest in this project or another? And then let's say something like streaming, especially if you don't have the right people that push it inside the organization, often is not prioritized.

Steffen Hoellinger (32:31):

So, they basically say, "We do it later," but ultimately this, again, harms the customer experience from our perspective because we actually believe that much more companies should do data streaming and should use the capabilities of event streaming. So, we want to make it more accessible even to these kind of companies. And some companies we actually work with, they don't even know that there is a streaming engine like Kafka running under the software that they use, but basically they just use Airy, they plug in a few of the connectors, for example Instagram or Salesforce, and then they basically pick a conversational AI tool and then they have a running prototype without ever touching a line of code because everything is working in a nice control center UI.

Kris Jenkins (33:18):

Yeah, that's very satisfying because I like touching lines of code and no one's taking that from me. But I also like the idea that other people who aren't quite of my persuasion can get this data moving around in their systems and actually build something easily. That makes me think though, you said you did some work in the incubator for Walmart and the Walton family. So, perhaps we can move on to this stage of the questioning, what state is Airy in as a company? You've launched. Are you still finding your market fit? Where are you as a company?

Steffen Hoellinger (33:59):

Yes. I would say we are still in the early stages of finding our market. So, we're in discussions with, let's say, some of the largest enterprises in the world where we try to basically bring them event streaming for some of the contexts that we've talked about. So, obviously that involves talking to the data streaming advocates within these companies. But also, for example, a group of people we care about is conversational teams, and all of these large companies they have sometimes large conversational teams that at the moment are composed of product managers, AI trainers, conversation designers, for example. So, we also partnered with the Conversation Design Institute, for example, to offer a course which is called Conversational Engineering, where we want to bring in the component that you actually need engineering as part of these teams, because at the moment it's often a discipline that is not present in these teams because ultimately we believe that building these customer experiences and this is what the conversational teams care about, it's effectively a data integration problem.

Steffen Hoellinger (35:09):

Because you need to bring the data, for example, about who are you as a customer, you need to bring that to the model so that you can actually personalize the experience. And we believe down the road you will actually train much more personalized models. So, right now I think these teams are often occupied with automating things, but they are not enabled at the moment to even access the data that is maybe sitting in some silo within the organization. So, to make it actually to really build great customer experiences.

Kris Jenkins (35:42):

Yeah. I've heard people say, half jokingly, that like 90% of ML and 90% of data science isn't actually doing ML and data science, it's trying to connect data sources together and groom the data quality, and all that rubbish.

Steffen Hoellinger (35:58):

Exactly.

Kris Jenkins (35:59):

If you can get past that, you're doing the world a favor.

Steffen Hoellinger (36:03):

We're trying.

Kris Jenkins (36:04):

I'm going to ask, so the one thing we are unable to do on the podcast is show any diagrams, because this is radio. So, if someone actually wants to see this in action, where should they go?

Steffen Hoellinger (36:14):

They can actually go to our website, which is Airy.co, and they can just download the software there, which is open source. So, they can play around with it. We recommend to install it in the cloud of your choice. You can also run it on Premise, which is a requirement obviously, that if you're a big bank or insurance company you have some servers in your basement, you need to have that, obviously. You can also run it on your local machine if your notebook is powerful enough. Yeah, you can actually do that or you can reach out to us and we will get you started with a test instance and some support to bring your use case to life as soon as possible.

Kris Jenkins (36:58):

Cool. Cool. Well, I wish you luck, both with the customer experience and the developer experience on the journey.

Steffen Hoellinger (37:06):

Thank you so much.

Kris Jenkins (37:06):

Cheers for joining us. Catch you again.

Steffen Hoellinger (37:08):

Thank you.

Kris Jenkins (37:08):

Thank you, Steffen. Now, if you want to check out Airy's tool set, either to use it directly or to inspire tools that you are building for your users perhaps, you'll find a link in the show notes or head to Airy.co/developers, which is their developer portal. However, if you're already feeling more than inspired enough and you want to get building, just hold on for one second while you click the like button, and the subscribe button, and comment box, and all those great things. We'd love to hear from you as always. And as always, you'll find my contact details in the show notes if you want to contact me directly. For more of our thoughts as a company on stream processing tools, head to developer.confluent.io for courses, tutorials and blog posts about building your first pipeline and making your second pipeline work even better.

Kris Jenkins (38:02):

If you need a place to run those stream processing pipelines, check out our cloud service for Apache Kafka at Confluent Cloud. It's fast to get started, it will scale to enterprise sizes, and if you add the code PODCAST100 to your account, you'll get $100 of extra free credit to run with. So, give it a try. And with that, it remains for me to thank Steffen Hoellinger for joining us, and you for listening. I've been your host, Kris Jenkins, and I will catch you next time.