Tuning a Hadoop cluster from a DevOps perspective requires an understanding of kernel and Linux principles. The following article describes the most important parameters along with tricks for optimal tuning. Memory Typically, modern Linux systems (Linux 2.6+) use swapping to avoid OOM (out of memory) to protect the system from kernel freezes. But Hadoop uses Java, and typically Java is configured with MAXHEAPSIZE per service (HDFS, HBase, Zookeeper, etc.). The configuration must match the memory available in the system. A common formula for MapReduce v1: TOTAL_MEMORY = (Mappers + Reducers) * CHILD_TASK_HEAP + TT_HEAP + DN_HEAP + RS_HEAP + OTHER_SERVICES_HEAP + 3GB (for OS and caches) For MapReduce v2 YARN takes care about the resources, but only for services which are running as YARN Applications. [1], [2] Disable swappiness echo 0 > /proc/sys/vm/swappiness and persist after reboots via sysctl.conf: echo “vm.swappiness = 0” >> /etc/sysctl.conf In addition, RedHat has
Hey, I'm Alex. I founded X-Warp, Infinimesh, Infinite Devices, Scalytics and worked with Cloudera, E.On, Google, Evariant, and had the incredible luck to build products with outstanding people in my life, across the globe.