Skip to main content

Posts

Showing posts from November, 2011

NFS exported HDFS (CDH3)

For some reasons it could be a good idea to make a hdfs filesystem available across networks as a exported share. Here I describe a working scenario with linux and hadoop with tools both have on board. I used fuse and libhdfs to mount a hdfs filesystem. Change namenode.local and <PORT> to fit your environment. Install:   yum install hadoop-0.20-fuse.x86_64 hadoop-0.20-libhdfs.x86_64 Create a mountpoint:   mkdir /hdfs-mount Mount your hdfs (testing):   hadoop-fuse-dfs dfs://namenode.local:<PORT> /hdfs-mount -d You will show like that:   INFO fuse_options.c:162 Adding FUSE arg /hdfs-mount  INFO fuse_options.c:110 Ignoring option -d  unique: 1, opcode: INIT (26), nodeid: 0, insize: 56  INIT: 7.10  flags=0x0000000b  max_readahead=0x00020000  INFO fuse_init.c:101 Mounting namenode.local:<PORT>  INIT: 7.8  flags=0x00000001  max_readahead=0x00020000  max_write=0x00020000  unique: 1, error: 0 (Success), outsize: 40 Hit crtl-C after you see "Su

All in one HDFS Cluster for your pocket

Update 1 (Nov 21, 2011): - added 3rd interface as host-only-adapter (hadoop1) - enabled trusted device eth2 About one year ago, I created a small XEN-environment for my engineering pourposes. When I was traveling for hours it was very helpful to track some issues or test new features. The problem was that I had to carry 2 notebooks with me. That was the reason I switched to VirtualBox [1] which runs on OSX, Linux and Windows as well. I could play with my servers and when I did, they configured to death and I reimported them into a clean setup. I think that will also be a good start for new people who have to find into the hadoop ecosystem to see the power without the harm of configuration in a multi-node environment. The appliance is created with VirtualBox, because it runs on OSX and Windows very easily. The idea behind it is to check new settings in a small environment rather easily; the appliance is designed for research, not for development and really not for production. The a

HDFS debugging scenario

The first step to debug issues in a running hadoop - environment to take a look at the stacktraces, easy accessible over jobtracker/stacks and let you show all running stacks in a jstack view. You will see the running processes, as an example I discuss a lab testing scenario, see below. http://jobtracker:50030/stacks Process Thread Dump:  43 active threads Thread 3203101 (IPC Client (47) connection to NAMENODE/IP:9000 from hdfs):   State: TIMED_WAITING   Blocked count: 6   Waited count: 7   Stack:     java.lang.Object.wait(Native Method)     org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:676)     org.apache.hadoop.ipc.Client$Connection.run(Client.java:719) In that case the RPC connection has a state "TIMED_WAIT" in a block and waited count. That means, the namenode could not answer the RPC request fast enough. The problem belongs the setup as I see often in production environments. For demonstration I use a ESX Cluster with a VM for the namen