Why Schedule Major Compactions?
Major compactions rewrite all store files of an HBase table, improving read performance but putting additional pressure on the cluster. Many administrators run them during off-peak windows. HBase itself does not provide built-in scheduling, so automation is typically handled with cron or at.
Ruby Script for HBase Shell
HBase shell executes commands through JRuby, so a simple script triggers the compaction:
# m_compact.rb major_compact 't1' exit
Cron-Compatible Shell Wrapper
Below is an example daily_compact script that refreshes a Kerberos ticket and runs the compaction via the HBase shell. Adapt table names, realms and keytabs to your environment.
#!/bin/bash USER=hbase PWD=$(echo ~$USER) TABLE=t1 # Kerberos settings KEYTAB=/etc/hbase/conf/hbase.keytab HOST=$(hostname) REALM=ALO.ALT LOG=/var/log/daily_compact # Acquire Kerberos ticket sudo -u $USER kinit -k -t $KEYTAB $USER/$HOST@$REALM # Trigger major compaction sudo -u $USER hbase shell $PWD/m_compact.rb 2>&1 | tee -a $LOG
Log Output
All messages are appended to /var/log/daily_compact. Example output may look like:
WARN conf.Configuration: hadoop.native.lib is deprecated 12 row(s) in 0.7800 seconds
Operational Notes (Modern Context)
- Major compactions increase disk and network load; avoid running on overloaded RegionServers.
- Consider minor compactions and compaction policies before scheduling forced majors.
- In cloud migrations or HBase-on-EMR clusters, perform compactions during autoscaling-stable hours.
- Always verify ticket renewal for Kerberos environments as shared keytabs often expire silently.
If you need help with distributed systems, backend engineering, or data platforms, check my Services.