The Data-Driven CPO | Build Better Products

Posts

Showing posts from November, 2014

Hadoop server performance tuning

Tuning a Hadoop cluster from a DevOps perspective requires an understanding of kernel and Linux principles. The following article describes the most important parameters along with tricks for optimal tuning. Memory Typically, modern Linux systems (Linux 2.6+) use swapping to avoid OOM (out of memory) to protect the system from kernel freezes. But Hadoop uses Java, and typically Java is configured with MAXHEAPSIZE per service (HDFS, HBase, Zookeeper, etc.). The configuration must match the memory available in the system. A common formula for MapReduce v1: TOTAL_MEMORY = (Mappers + Reducers) * CHILD_TASK_HEAP + TT_HEAP + DN_HEAP + RS_HEAP + OTHER_SERVICES_HEAP + 3GB (for OS and caches) For MapReduce v2 YARN takes care about the resources, but only for services which are running as YARN Applications. [1], [2] Disable swappiness echo 0 > /proc/sys/vm/swappiness and persist after reboots via sysctl.conf: echo “vm.swappiness = 0” >> /etc/sysctl.conf In addition, RedHat has

The Data-Driven CPO | Build Better Products

Search This Blog

Posts

Hadoop server performance tuning

Get new posts by email: