Skip to main content

Kafka on Object Storage Was Inevitable. The Next Step Is Open.

Kafka on object storage is not a trend. It is a correction. WarpStream proved that the Kafka protocol can run without stateful brokers by pushing durability into object storage. The next logical evolution is taking that architecture out of vendor-controlled control planes and making it open and self-hosted. KafScale is built for teams that want Kafka client compatibility, object storage durability, and Kubernetes-native operations without depending on a managed metadata service.

The problem was never Kafka clients

The Kafka protocol is one of the most successful infrastructure interfaces ever shipped. It is stable, widely implemented, and deeply integrated into tooling and teams. The part that aged poorly is not the protocol. It is the original broker-centric storage model.

Stateful brokers made sense in a disk-centric era where durability lived on the same machines that ran compute. That coupling forces partition rebalancing, replica movement, disk hot spots, slow recovery, and persistent overprovisioning.

WarpStream proved the core idea

WarpStream demonstrated that Kafka compatibility does not require broker-local disks. By storing log segments in object storage and running stateless compute to serve the Kafka protocol, they showed that elastic scaling and simplified operations are possible without changing clients.

Their work validated an architectural shift many teams suspected was viable but unproven at scale.

When Confluent acquired WarpStream in 2023, it confirmed that Kafka on object storage is no longer an experiment but a mainstream direction for streaming platforms.

You can read more about WarpStream’s architecture and approach on their site: warpstream.com

What comes after “diskless Kafka”

Once an architecture becomes credible, teams stop asking whether it works and start asking whether it fits their constraints.

WarpStream, like many modern streaming services, relies on a vendor-managed control plane for metadata and coordination. For many organizations, that is a reasonable trade-off. For others, it is a hard blocker.

  • Regulated environments that cannot depend on external control planes
  • Sovereign and private cloud deployments
  • Teams that require open-source licensing and forkability
  • Platforms that prefer self-hosted economics over managed margins

The next logical evolution: open and self-hosted

KafScale is an open-source, Apache 2.0 licensed streaming platform that applies the stateless, object-storage-backed Kafka model in a fully self-hosted form. It runs on Kubernetes, stores durable log segments in S3, and uses etcd for topic metadata, offsets, and consumer group coordination.

This is not a rejection of what WarpStream proved. It is the next logical step after that proof became accepted.

From Proof to Platform

WarpStream demonstrated that Kafka compatibility does not require stateful brokers. KafScale applies the same architectural correction under different operational and licensing constraints.

Dimension WarpStream KafScale
Kafka Protocol Compatible Compatible
Broker State Stateless Stateless
Durable Storage Object Storage Object Storage (S3)
Metadata Control Vendor Managed Self-Hosted (etcd)
Deployment Model Managed Service Kubernetes
License Closed Source Apache 2.0
Primary Tradeoff Operational Convenience Operational Control
  • Kafka protocol compatibility, so existing clients and tooling continue to work
  • Stateless brokers, treated as ephemeral Kubernetes pods
  • Object storage as the source of truth, using immutable segments in S3
  • Self-hosted metadata, using etcd for topic maps, offsets, and consumer group state
  • Apache 2.0 licensing, with no usage restrictions or control plane dependency

The architecture is inevitable. The deployment model should be a choice.

Tradeoffs you should understand

Separating compute from durable storage changes the latency profile and simplifies operations. It is a strong fit for durable pipelines, logs, ETL, and asynchronous event transport. It is not a universal replacement for every Kafka workload.

  • If you need sub-10ms latency, stateful brokers are usually a better fit.
  • If you rely on exactly-once transactions or compacted topics, KafScale does not target that scope.
  • If you want a fully managed service, a managed offering is the right choice.

KafScale is built for the common case: durable replay, predictable retention, and minimal operational overhead.

Why this matters now

WarpStream accelerated industry acceptance by proving that Kafka on object storage works in production. That question is now settled.

The next phase is about control, licensing, and deployment freedom. The remaining decision is not whether to adopt this architecture, but how much control teams retain when they do.

This article builds on earlier discussions shared with the developer community.

Where to go next

If you need help with distributed systems, backend engineering, or data platforms, check my Services.

Most read articles

Why Is Customer Obsession Disappearing?

Many companies trade real customer-obsession for automated, low-empathy support. Through examples from Coinbase, PayPal, GO Telecommunications and AT&T, this article shows how reliance on AI chatbots, outsourced call centers, and KPI-driven workflows erodes trust, NPS and customer retention. It argues that human-centric support—treating support as strategic investment instead of cost—is still a core growth engine in competitive markets. It's wild that even with all the cool tech we've got these days, like AI solving complex equations and doing business across time zones in a flash, so many companies are still struggling with the basics: taking care of their customers. The drama around Coinbase's customer support is a prime example of even tech giants messing up. And it's not just Coinbase — it's a big-picture issue for the whole industry. At some point, the idea of "customer obsession" got replaced with "customer automation," and no...

How to scale MySQL perfectly

When MySQL reaches its limits, scaling cannot rely on hardware alone. This article explains how strategic techniques such as caching, sharding and operational optimisation can drastically reduce load and improve application responsiveness. It outlines how in-memory systems like Redis or Memcached offload repeated reads, how horizontal sharding mechanisms distribute data for massive scale, and how tools such as Vitess, ProxySQL and HAProxy support routing, failover and cluster management. The summary also highlights essential practices including query tuning, indexing, replication and connection management. Together these approaches form a modern DevOps strategy that transforms MySQL from a single bottleneck into a resilient, scalable data layer able to grow with your application. When your MySQL database reaches its performance limits, vertical scaling through hardware upgrades provides a temporary solution. Long-term growth, though, requires a more comprehensive approach. This invo...

What the Heck is Superposition and Entanglement?

This post is about superposition and interference in simple, intuitive terms. It describes how quantum states combine, how probability amplitudes add, and why interference patterns appear in systems such as electrons, photons and waves. The goal is to give a clear, non mathematical understanding of how quantum behavior emerges from the rules of wave functions and measurement. If you’ve ever heard the words superposition or entanglement thrown around in conversations about quantum physics, you may have nodded politely while your brain quietly filed them away in the "too confusing to deal with" folder.  These aren't just theoretical quirks; they're the foundation of mind-bending tech like Google's latest quantum chip, the Willow with its 105 qubits. Superposition challenges our understanding of reality, suggesting that particles don't have definite states until observed. This principle is crucial in quantum technologies, enabling phenomena like quantum comp...