novatechflow

Posts

Showing posts from November, 2012

Memory consumption in Flume

Memory required by each Source or Sink The heap memory used by a single event is dominated by the data in the event body with some incremental usage by any headers added. So in general, a source or a sink will allocate roughly the size of the event body + maybe 100 bytes of headers (this is affected by headers added by the txn agent). To get the total memory used by a single batch, multiply your average (or 90th percentile) event size (plus some additional buffer) by the maximum batch size. This will give you the memory needed by a single batch. The memory required for each Sink is the memory needed for a single batch and the memory required for each Source is the memory needed for a single batch, multiplied by the number of clients simultaneously connected to the Source. Keep this in mind, and plan your event delivery according to your expected throughput. Memory required by each File Channel Under normal operation, each File Channel uses some heap memory and some direct me...

HBase major compaction per cronjob

Sometimes I get asked how a admin can run a major compaction on a particular table at a time when the cluster isn't usually used. This can be done per cron, or at. HBase shell needs a ruby script, which is very simple: # cat m_compact.rb major_compact 't1' exit A working shell script for cron, as example: # cat daily_compact #!/bin/bash USER=hbase PWD=`echo ~$USER` TABLE=t1 # kerberos enabled KEYTAB=/etc/hbase/conf/hbase.keytab HOST=`hostname` REALM=ALO.ALT LOG=/var/log/daily_compact # get a new ticket sudo -u $USER kinit -k -t $KEYTAB $USER/$HOST@$REALM # start compaction sudo -u $USER hbase shell $PWD/m_compact.rb 2>&1 |tee -a $LOG All messages will be redirected to /var/log/daily_compact : 11/15/13 06:49:26 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available 12 row(s) in 0.7800 seconds