Export HDFS over CIFS (Samba3)

Listen:

Three weeks ago I played with libhdfs and NFS, but I did not get the results I expected. Then my next idea was, why not to use Samba? Samba3x is stable and most OS can mount an exported share.
The main task was to research the performance and setup of this scenario, because samba has a lot of tuning mechanisms inside. Let's go!

I used a RHEL 5.7 and the delivered RPMs:
#> rpm -qa|grep samba
samba-3.0.33-3.29.el5_7.4.x86_64
samba-common-3.0.33-3.29.el5_7.4.x86_64

Like I described in "NFS exported HDFS" I mounted hdfs over fuse into the directory /123 via /etc/fstab:

#> cat /etc/fstab
[..]
hadoop-fuse-dfs#dfs://NAMENODE:9000 /123/hdfs fuse usetrash,rw 0 0

and checked it:
#> mount
[..]
fuse on /123/hdfs type fuse (rw,nosuid,nodev,allow_other,default_permissions)

#> ls -la /123
total 16
drwxr-xr-x 3 root root 4096 Dec 9 16:36 .
drwxr-xr-x 27 root root 4096 Dec 9 12:11 ..
drwxr-xr-x 5 hdfs nobody 4096 Dec 9 02:14 hdfs

The first step afterwards is to configure samba. I figured that config out:

#> cat /etc/samba/smb.conf
[global]
bind interfaces only = yes
deadtime = 15
default case = lower
disable netbios = yes
interfaces = eth0
dns proxy = no
workgroup = HDFS
server string = Samba Server Version %v
socket options = TCP_NODELAY IPTOS_LOWDELAY SO_RCVBUF=65536 SO_SNDBUF=65536
load printers = no
max connections = 30
strict sync = no
sync always = no
syslog = 1
syslog only = yes
security = user
smb passwd file = /etc/samba/smbpasswd

[hdfs]
comment = HDFS
path = /123/hdfs
public = yes
writable = yes
printable = no
create mask = 0744
force user = hdfs
force group = nobody

Created user and password, here I used the hdfs-system-user (id=hdfs, group=nobody)

smbpasswd -a username

At last I started the server:
#> service smb restart

Test cases
For testing I used another RHEL5.7 server and mounted the exported share into /test:
#> mount -t cifs -o username=hdfs,rw //SAMBASERVER/hdfs /test
Password: HERE_THE_PASSWORD

check:
#> ls -la /test/
total 8
drwxr-xr-x 5 hdfs nobody 0 Dec 9 02:14 .
drwxr-xr-x 25 root root 4096 Dec 9 15:03 ..
drwxr-xr-x 3 hdfs nobody 0 Dec 9 02:12 mapred
drwxr-xr-x 3 hdfs nobody 0 Dec 9 02:13 opt
drwxr-xr-x 6 hdfs nobody 0 Dec 9 15:56 user

Now the hdfs from my testcluster is exported via samba. So far, so good.

My first test concerned the read performance, here I chose a rsync of a smaller logfile collection:
#> cd /tmp/rsync-test
#> rsync -av /test/hdfs/user/flume/weblogs/2011-12-07/ .
sent 20478888644 bytes received 92606 bytes 17377158.46 bytes/sec
total size is 20475835998
(19GB, 16 MB/s)

How many files I synced?
#> find . -type f |wc -l
4665

Okay, that worked. Then I tested the write speed, here I used a plain file I created with

#> dd if=/dev/zero of=/tmp/13GB bs=128M count=100

and copied it into the cifs-mount, for testing with "time":
#> time cp /tmp/13GB /test/hdfs/user/
real 7m57.864s
user 0m0.328s
sys 0m20.602s

= around 27 mb/s

checked for correct rights and groups on hdfs:

hdfs#> hadoop dfs -ls /user
Found 1 item
-rw-r--r-- 3 hdfs supergroup 13421772800 2011-12-09 15:56 /user/13GB

To compare with a scp write test I used:
#> scp /tmp/13GB hdfs@SAMBASERVER:/123/hdfs/user

and got
13GB 100% 13GB 47.8MB/s 04:28

which is much faster. The overhead from samba will cost performance, for sure.

Conclusion
It is possible to export a hdfs filesystem over libhdfs and samba to clients and get acceptable results. That makes some tasks easier, including the use of hdfs as a (limited) cluster storage.

Links:
Samba-Tuning: https://calomel.org/samba_optimize.html

Comments

Anonymous20 December, 2011
I found your post saying you were having some issues at the start of December 2011, how are you finding the FUSE module now? Is it much more reliable, or are there settings that help with reliability/performance?
ReplyDelete
Replies
Anonymous23 December, 2011
With CIFS I got good results, here you have the opportunity to tune samba a lot. It runs stable, but you have to count with lesser performance as raw writes.
ReplyDelete
Replies
Anonymous19 January, 2012
Not really. I have tested from an Windows7 client, MacOSX, Linux. Whats the error message in the eventlog for?
ReplyDelete
Replies
pieland31 January, 2012
I'm still in problems.

The error code in widnows 2008 and win7 is 0x8007045d when uploading.

logging and downlading are ok, but uploading failed.

I've tested with windows xp sp3, windows 7 and windows 2008 r2 client .
The same samba configurations without hdfs, uploading and downloading ok.

Samba version 3.0.33-3.29.el5_7.4
Hadoop 0.20.2
My email address is jaeminj@gmail.com
If you concerns my configuration, I can opens my servers.

Thanks for your responce.
ReplyDelete
Replies
Anonymous31 January, 2012
http://superuser.com/a/315460

Depends on the NTLM Auth from Win7. The client dont have the right to write, and that depends on the missing kerberos setup. Windows uses the given username, if you run as administrator you will be send uid 0, and uid 0 we have prohibited.
Solution:
Set up your cluster with Kerberos Auth or add the user in windows and work in a context of them.
ReplyDelete
Replies
Unknown15 August, 2012
Does samba3 have any kernel requirement like NFS?
ReplyDelete
Replies
Anonymous15 August, 2012
No, need xattr and ext3, so far I know.
ReplyDelete
Replies

Add comment

novatechflow

Search This Blog

Export HDFS over CIFS (Samba3)

Labels

Comments

Post a Comment

Popular posts from this blog

Beyond Ctrl+F - Use LLM's For PDF Analysis

Deal with corrupted messages in Apache Kafka

What Makes You The Number 1 Product Manager?