Kafscale: Kafka-Compatible Streaming with Stateless Brokers and S3 Segment Storage
Kafscale is an Apache Kafka protocol-compatible streaming platform built for workloads that need durable message delivery, consumer offsets, and replay without running stateful Kafka brokers. Brokers are stateless pods, S3 stores immutable segments as the source of truth, and etcd handles topic metadata and consumer group offsets. Kubernetes manages scaling and failover.
The 80% use case at 30% of the cost. Most Kafka deployments function as durable pipes. They move events, track offsets, and support replay. They do not need sub-millisecond latency, exactly-once transactions, or compacted topics. Kafscale targets this common case: ~$110/month for 100GB/day throughput vs $400+ for self-managed Kafka or $200+ for managed offerings.
Who Kafscale is for
- Platform teams running Kafka as infrastructure plumbing, not as a product feature
- Data engineering teams using Kafka primarily for CDC, log aggregation, or event sourcing where latency tolerance is 100ms+
- Organizations where Kafka operational burden exceeds the value of features they do not use
- Greenfield projects that want Kafka client compatibility without Kafka operational complexity
Not a fit for: Trading systems, real-time bidding, or workloads requiring exactly-once semantics, compacted topics, or single-digit millisecond latency.
Architecture
Brokers accept Kafka protocol connections, buffer writes, flush segments to S3, serve reads with caching and read-ahead, and coordinate consumer groups. etcd stores metadata: topic configuration, partition state, consumer group membership, and committed offsets. S3 stores immutable segment and index objects that represent the message log.
What problem Kafscale solves
Most Kafka deployments function as durable pipes. They move events from point A to points B through N, track consumer offsets, and depend on replay when downstream systems fail. Many teams do not need sub-millisecond latency, exactly-once transactions, or compacted topics, but they still pay for stateful broker operations, disk management, rebalancing workflows, and on-call complexity.
Kafscale targets the common case by trading latency for operational simplicity. Brokers do not persist message logs locally. Message data is stored as immutable segments in S3. Kubernetes provides scheduling, scaling, and failover. The goal is to make durable message transport easier to operate without changing client integrations, since the system speaks the Kafka protocol.
Scope
In scope
- Kafka protocol compatibility for core producer and consumer workflows
- Produce and fetch paths backed by immutable segment storage
- Consumer groups, membership, heartbeats, and committed offsets
- Topic administration needed for everyday platform use
- Kubernetes operator integration via CRDs for cluster and topic lifecycle
Explicit non-goals
- Exactly-once semantics and transactions
- Compacted topics
- Kafka internal replication and ISR protocols
- Embedding stream processing inside the broker
Kafscale only does durable message transport. Stream processing remains the responsibility of compute engines such as Apache Flink, Apache Wayang, or any other stack that reads from Kafka topics. This keeps the broker surface area small and preserves compatibility with the Kafka ecosystem.
Storage and data model
Each topic is partitioned. Each partition is represented as an ordered sequence of immutable segment files plus a sparse index file used for offset-to-position lookup. Segment keys are based on base offsets so storage remains append-friendly and retention is handled with S3 lifecycle policies.
Topics and partitions
S3 key layout
s3://{bucket}/{namespace}/{topic}/{partition}/segment-{base_offset}.kfs
s3://{bucket}/{namespace}/{topic}/{partition}/segment-{base_offset}.index
Segment file format
Each segment is a self-contained file with messages and metadata. The format includes a header for identification and versioning, message batches containing the actual records, and a footer with checksums for integrity verification.
Write path
When a producer sends messages, the broker validates ownership, buffers the data, assigns offsets, and eventually flushes to S3. The acks setting controls when the producer receives confirmation.
Read path
When a consumer fetches messages, the broker locates the relevant segment, checks the cache, and retrieves data from S3 if needed. Read-ahead prefetching improves performance for sequential consumers.
S3 resiliency and backpressure
Kafscale deliberately avoids persistent local queues. When S3 misbehaves, the system surfaces it through protocol-native backpressure and operator automation instead of inventing new operational knobs.
Every broker tracks S3 health as Healthy, Degraded, or Unavailable based on sliding-window PutObject latency and error metrics. The same health monitor wraps the fetch path so degraded buckets slow read-ahead and emit REQUEST_TIMED_OUT; unavailability raises UNKNOWN_SERVER_ERROR immediately so consumers understand the outage.
Operator guardrails
The Kubernetes operator watches broker health via control-plane RPCs or Prometheus. When any broker is Degraded, rollouts are paused. If a quorum reports Unavailable, the operator halts HPA decisions, emits alerts, and optionally rechecks IAM credentials and endpoints before resuming.
Surfacing state
The broker exposes /metrics with Prometheus-style gauges and BrokerControl.GetStatus returns a sentinel partition named __s3_health whose state field reflects the current S3 state. Operators or HPAs can watch either interface to gate rollouts or trigger alerts. For ops teams that prefer push semantics, the broker also opens a StreamMetrics gRPC stream and continuously emits the latest health snapshot plus derived latency and error stats to the operator so automation can react without scraping delays.
Consumer group protocol
Kafscale implements the standard Kafka consumer group protocol. Groups transition through states as members join, leave, or fail heartbeats. The broker handles coordination, assignment, and offset tracking.
Operational defaults
- Bucket naming:
kafscale-{environment}-{region}to isolate IAM and retention policies - Region affinity: bucket region matches the Kubernetes cluster region to avoid cross-region cost and latency
- Encryption: SSE-KMS with a customer-managed CMK when provided; SSE-S3 fallback with a warning
- Lifecycle retention: operator-managed prefix rules derived from topic retention configuration
Current development status
Kafscale is under active development and not yet production-ready. Compatibility regression testing, fault injection coverage, and repeatable benchmarks are required before any production recommendation.
How to use Kafscale in an architecture
Kafscale is intended to be used as a Kafka-compatible transport layer. Producers and consumers connect using standard Kafka client libraries. Downstream compute engines such as Flink or Wayang read from Kafscale topics using their existing Kafka connectors. The platform focus remains durable delivery and replay, not embedded processing.
Related resources
If you are evaluating Kafscale or similar architectures for your organization:
- Create and lead distributed systems architecture - architectural patterns for stateless services and object storage backends
- Apache Flink architecture leadership - stream processing design that pairs well with Kafscale as a source
- Iceberg data platform architecture - lakehouse patterns where Kafscale can serve as the ingestion layer
I help teams assess streaming architectures, reduce operational burden, and design cost-effective data platforms.
→ See how I work with teams or book a call.
If you need help with distributed systems, backend engineering, or data platforms, check my Services.