Overview
The HBase sink was added to the Flume trunk and provided direct write support from Flume channels into HBase tables. It relies on synchronous HBase client operations and requires that HBase table metadata already exists. The sink handles flushes, transactions and rollbacks, allowing Flume to treat HBase as a durable storage target.
Building Flume from Trunk
In early versions the HBase sink was only available in the trunk source. The following sequence checks out Flume and builds it using Maven:
git clone git://git.apache.org/flume.git cd flume git checkout trunk mvn package -DskipTests cd flume-ng-dist/target
Inside the repository, the sink is located under:
flume-ng-sinks/flume-ng-hbase-sink/src/main/java/org/apache/flume/sink/hbase/
Important Details
-
Flume uses the first
hbase-site.xmlit finds on theCLASSPATH. If multiple HBase versions coexist on a machine, pay attention to classpath ordering. - The target HBase table, column family and qualifier must already exist.
- The sink initially supported only synchronous HBase operations; asynchronous support was planned under FLUME-1252.
Example Flume Configuration for HBase Sink
The following configuration shows a simple in-memory channel feeding an HBase sink:
host1.sources = src1 host1.sinks = sink1 host1.channels = ch1 # Source definition (Seq source) host1.sources.src1.type = seq host1.sources.src1.port = 25001 host1.sources.src1.bind = localhost host1.sources.src1.channels = ch1 # HBase sink host1.sinks.sink1.type = org.apache.flume.sink.hbase.HBaseSink host1.sinks.sink1.channel = ch1 host1.sinks.sink1.table = test3 host1.sinks.sink1.columnFamily = testing host1.sinks.sink1.column = foo # Serializer (converting event data into HBase-compatible format) host1.sinks.sink1.serializer = org.apache.flume.sink.hbase.SimpleHbaseEventSerializer host1.sinks.sink1.serializer.payloadColumn = pcol host1.sinks.sink1.serializer.incrementColumn = icol # Channel host1.channels.ch1.type = memory
Why the Serializer Matters
HBase expects KeyValue or Cell structures. The Flume HBase sink uses serializers to convert Flume events into a format suitable for HBase storage. The SimpleHbaseEventSerializer is a basic serializer that writes event payloads into a configured column family and qualifier.
Operational Notes (Modern Context)
- Flume-to-HBase pipelines still exist in legacy estates; treat them as migration candidates.
- Ensure RegionServers are not overloaded—HBase writes are synchronous in this sink.
- For modern ingestion, consider Kafka → HBase Connectors or NiFi PutHBase processors.
- Classpath conflicts remain a common operational issue with the HBase sink.
References
If you need help with distributed systems, backend engineering, or data platforms, check my Services.