The problem was never Kafka clients
The Kafka protocol is one of the most successful infrastructure interfaces ever shipped. It is stable, widely implemented, and deeply integrated into tooling and teams. The part that aged poorly is not the protocol. It is the original broker-centric storage model.
Stateful brokers made sense in a disk-centric era where durability lived on the same machines that ran compute. That coupling forces partition rebalancing, replica movement, disk hot spots, slow recovery, and persistent overprovisioning.
WarpStream proved the core idea
WarpStream demonstrated that Kafka compatibility does not require broker-local disks. By storing log segments in object storage and running stateless compute to serve the Kafka protocol, they showed that elastic scaling and simplified operations are possible without changing clients.
Their work validated an architectural shift many teams suspected was viable but unproven at scale.
When Confluent acquired WarpStream in 2023, it confirmed that Kafka on object storage is no longer an experiment but a mainstream direction for streaming platforms.
You can read more about WarpStream’s architecture and approach on their site: warpstream.com
What comes after “diskless Kafka”
Once an architecture becomes credible, teams stop asking whether it works and start asking whether it fits their constraints.
WarpStream, like many modern streaming services, relies on a vendor-managed control plane for metadata and coordination. For many organizations, that is a reasonable trade-off. For others, it is a hard blocker.
- Regulated environments that cannot depend on external control planes
- Sovereign and private cloud deployments
- Teams that require open-source licensing and forkability
- Platforms that prefer self-hosted economics over managed margins
The next logical evolution: open and self-hosted
KafScale is an open-source, Apache 2.0 licensed streaming platform that applies the stateless, object-storage-backed Kafka model in a fully self-hosted form. It runs on Kubernetes, stores durable log segments in S3, and uses etcd for topic metadata, offsets, and consumer group coordination.
This is not a rejection of what WarpStream proved. It is the next logical step after that proof became accepted.
From Proof to Platform
WarpStream demonstrated that Kafka compatibility does not require stateful brokers. KafScale applies the same architectural correction under different operational and licensing constraints.
| Dimension | WarpStream | KafScale |
|---|---|---|
| Kafka Protocol | Compatible | Compatible |
| Broker State | Stateless | Stateless |
| Durable Storage | Object Storage | Object Storage (S3) |
| Metadata Control | Vendor Managed | Self-Hosted (etcd) |
| Deployment Model | Managed Service | Kubernetes |
| License | Closed Source | Apache 2.0 |
| Primary Tradeoff | Operational Convenience | Operational Control |
- Kafka protocol compatibility, so existing clients and tooling continue to work
- Stateless brokers, treated as ephemeral Kubernetes pods
- Object storage as the source of truth, using immutable segments in S3
- Self-hosted metadata, using etcd for topic maps, offsets, and consumer group state
- Apache 2.0 licensing, with no usage restrictions or control plane dependency
The architecture is inevitable. The deployment model should be a choice.
Tradeoffs you should understand
Separating compute from durable storage changes the latency profile and simplifies operations. It is a strong fit for durable pipelines, logs, ETL, and asynchronous event transport. It is not a universal replacement for every Kafka workload.
- If you need sub-10ms latency, stateful brokers are usually a better fit.
- If you rely on exactly-once transactions or compacted topics, KafScale does not target that scope.
- If you want a fully managed service, a managed offering is the right choice.
KafScale is built for the common case: durable replay, predictable retention, and minimal operational overhead.
Why this matters now
WarpStream accelerated industry acceptance by proving that Kafka on object storage works in production. That question is now settled.
The next phase is about control, licensing, and deployment freedom. The remaining decision is not whether to adopt this architecture, but how much control teams retain when they do.
This article builds on earlier discussions shared with the developer community.
Where to go next
If you need help with distributed systems, backend engineering, or data platforms, check my Services.