novatechflow | Alexander Alten

Posts

Showing posts from November, 2012

Practical Memory Sizing for Apache Flume Sources, Sinks and File Channels

Apache Flume still appears in many legacy data estates, and most operational issues come from undersized heap or direct memory. This updated guide explains how to estimate memory requirements for Flume sources, sinks and file channels, how batch sizing impacts heap usage, and how replay behavior can drastically increase memory demand. The goal is to give operators a reliable sizing baseline instead of trial-and-error tuning. Memory Requirements for Flume Sources and Sinks The dominant memory cost for each event comes from its body plus a small overhead for headers (typically around 100 bytes, depending on the transaction agent). To estimate memory for a batch: Take the average or p90 event size. Add a buffer for headers and variability. Multiply by the maximum batch size. This result approximates the memory required to hold a batch in a Source or Sink. A Sink needs memory for one batch at a time. A Source needs memory for one batch multiplied by the number of ...

Automating HBase Major Compactions with Cron and Kerberos

Major compactions in HBase can be scheduled during low-traffic hours to reduce load on RegionServers. This guide shows how to trigger a compaction from the HBase shell using a simple Ruby script and how to wrap it in a Kerberos-aware cron job. It reflects common operational practice in legacy Hadoop clusters where maintenance windows still matter. Why Schedule Major Compactions? Major compactions rewrite all store files of an HBase table, improving read performance but putting additional pressure on the cluster. Many administrators run them during off-peak windows. HBase itself does not provide built-in scheduling, so automation is typically handled with cron or at . Ruby Script for HBase Shell HBase shell executes commands through JRuby, so a simple script triggers the compaction: # m_compact.rb major_compact 't1' exit Cron-Compatible Shell Wrapper Below is an example daily_compact script that refreshes a Kerberos ticket and runs the compaction via the HBase...