Skip to main content


Showing posts from November, 2010

Get an Hadoop cluster running in 20 minutes

Since Cloudera was starting to build a complete stack it is more easy to get a cluster up and running. They build rpm's for Redhat based systems, I use CentOS, the newest CDH build is CDH2, which comes with a lot nice addons. Thanks for the work, guys! What hardware we should use? Simple, moderate servers are pretty nice for. 2 CPU, 8 GB RAM, 4x750GB HDD will be enough. The master should have 4 CPU, 8 or more GB RAM and 100 GB RAID 5, we need that hardware twice.Why? The master is the most important server, hosting the namenode, jobtracker and, if you setup, ganglia. The hardware you use depends on your usecases, what I describe there works for eventanalysis and weblogs. But it will be a good start to see what power hadoop can deliver. The systems are located in the same rack so we can rack awareness moving in background. Use lvm and create a large directory, I use /opt/hadoop: # df -h /opt/hadoop/ Filesystem            Size  Used Avail Use% Mounted on /dev/mapper/myvg-hado