About Us Icon About Us Icon Business Analyst Icon Business Analyst Icon CEO Icon CEO Icon Datameer Icon Datameer Icon Envelope Icon Envelope Icon Facebook Icon Facebook Icon Google Plus Icon Google Plus Icon Instagram Icon Instagram Icon IT Professional Icon IT Professional Icon Learn Icon Learn Icon Linkedin Icon Linkedin Icon Product Icon Product Icon Partners Icon Partners Icon Search Icon Search Icon Social Networks Icon Social Networks Icon Share Icon Share Icon Support Icon Support Icon Testimonial Icon Testimonial Icon Twitter Icon Twitter Icon

Datameer Blog

How Apache Kafka Works and Why

By on March 1, 2016

I’m sure many of you know Jay Kreps, the man who developed Apache Kafka. Considering how well known it is now, it’s funny to hear Jay say at first it wasn’t a popular open-source project. There were only a small number of enthusiastic fans (including me!), but for the most part, people weren’t sure what it was. Jay and his team initially went with calling it a “messaging system” but that didn’t really get any attention. But, the industry finally took notice when it needed to consider solutions for data flow and stream processing.

I was also really interested in hearing Jay’s experience building Confluent. If you’d like to hear his thoughts around being a first time entrepreneur, be sure to check out episode six.

How Apache Kafka Works

Episode 1: What is Apache Kafka and How Does It Work?


Learn why Jay Kreps founded Kafka. What is it, and how does Apache Kafka work? Back then, they were focused on solving the problem of having data spread out over many systems. Fun fact: they thought it was going to be easy — and it wasn’t.

Episode 2: How Does Apacha Kafka Work? [Diagram]

Jay and I whiteboard the design. Who came up with Kafka’s design and what were their learnings from it? Originally, the challenge was around how to represent it. It’s really clear when representing a file so it’s easy to make it a distributed file. But how do you represent a stream?

Episode 3: What is Apache Kafka Used For?

So what are the use cases around Apache Kafka and the problems it’s solving? Jay talks about data pipelines, and how you don’t have to think ahead of time about where the data’s going. You can publish, and others can tap into the data. The other main use case is stream processing – building applications that respond to data in real time.

Episode 4: Where Do Apache Kafka and Internet of Things Connect?

Kafka often comes up in IoT conversations. For Jay, IoT seeks Kafka because of its ability for stream processing, and fine-grained analytics around feedback loops and data-driven products.

Episode 5: Let’s Talk Endpoint Compression & Apache Spark

What’s Jay’s philosophy around endpoint compression, and what are the future conversations going to be around that?

Episode 6: What It’s Like as a First Time Entrepreneur

It’s pretty difficult being an entrepreneur in Silicon Valley. Learn about Jay’s inspiration for founding Confluent and the future challenges he foresees.

Episode 7: What New Tech Are You Keeping an Eye On?

What gets Jay excited about what’s happening in the tech world? He talks about streaming data and streaming processing. But he also makes a new prediction for databases – he sees another generation of database companies.

Free Trial Datameer

Connect with Datameer

Follow us on Twitter
Connect with us on LinkedIn, Google+ and Facebook


Stefan Groschupf

Stefan Groschupf

Stefan Groschupf is a big data veteran and serial entrepreneur with strong roots in the open source community. He was one of the very few early contributors to Nutch, the open source project that spun out Hadoop, which 10 years later, is considered a 20 billion dollar business. Open source technologies designed and coded by Stefan can be found running in all 20 of the Fortune 20 companies in the world, and innovative open source technologies like Kafka, Storm, Katta and Spark, all rely on technology Stefan designed more than a half decade ago. In 2003, Groschupf was named one of the most innovative Germans under 30 by Stern Magazine. In 2013, Fast Company named Datameer, one of the most innovative companies in the world. Stefan is currently CEO and Chairman of Datameer, the company he co-founded in 2009 after several years of architecting and implementing distributed big data analytic systems for companies like Apple, EMI Music, Hoffmann La Roche, AT&T, the European Union, and others. After two years in the market, Datameer was commercially deployed in more than 30 percent of the Fortune 20. Stefan is a frequent conference speaker, contributor to industry publications and books, holds patents and is advising a set of startups on product, scale and operations. If not working, Stefan is backpacking, sea kayaking, kite boarding or mountain biking. He lives in San Francisco, California.

Subscribe