Skip to main content

Posts

Showing posts from July, 2016

Handling Corrupted Kafka Messages and Offset Recovery in Distributed Systems

This article explains how corrupted Kafka messages occurred in early Kafka versions, how offsets were stored in Zookeeper and how to manually recover a stuck consumer. It documents the race condition described in KAFKA 2477, shows how to inspect offsets using Kafka tools or Zookeeper and describes code based and operational strategies for skipping bad messages in older distributed log systems. Handling Corrupted Kafka Messages and Offset Recovery in Distributed Systems In older Kafka deployments, especially versions before 0.9, it was possible for a message in a topic to become unreadable due to corruption. This happened most often when third party frameworks interacted with Kafka internals or when the consumer logic encountered a rare race condition. One such condition was documented in KAFKA 2477, where a lock on Log.read was missing at the consumer level while Log.write remained protected. Under specific timing, this resulted in a corrupted message being written to ...