Many engineers still ask for a simple and fast way to secure a Hadoop cluster without integrating Active Directory or enterprise-wide authentication. This guide modernizes the classic lightweight approach: deploy a small MIT Kerberos KDC, create only the required Hadoop service principals, and enable Kerberos-based authentication across HDFS and YARN.
If you need a deeper dive into multi-realm trust, security architecture or production-grade Kerberos setups, see the extended guide: Hadoop and Trusted MITv5 Kerberos.
1. Install and Configure a MIT Kerberos KDC
Install the Kerberos server packages and adjust the default configuration. Replace EXAMPLE.COM with your own realm. Below is an example for a small cluster:
[libdefaults]
default_realm = ALO.ALT
dns_lookup_realm = false
dns_lookup_kdc = false
[realms]
ALO.ALT = {
kdc = HADOOP1.ALO.ALT:88
admin_server = HADOOP1.ALO.ALT:749
default_domain = ALO.ALT
}
[domain_realm]
.alo.alt = ALO.ALT
alo.alt = ALO.ALT
[logging]
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmin.log
default = FILE:/var/log/krb5lib.log
Ensure your DNS or /etc/hosts is consistent across nodes:
192.168.56.101 hadoop1.alo.alt hadoop1
172.22.2.130 hadoop2.alo.alt hadoop2
Allow administrative principals:
# cat /var/kerberos/krb5kdc/kadm5.acl
*/admin@ALO.ALT *
Initialize your realm:
kdb5_util create -s
systemctl start krb5kdc kadmin
2. Add Hadoop Service Principals
Use kadmin.local on the KDC host to create service principals. For HDFS, YARN, MapReduce and HTTP SPNEGO:
addprinc -randkey hdfs/hadoop1.alo.alt@ALO.ALT
addprinc -randkey mapred/hadoop1.alo.alt@ALO.ALT
addprinc -randkey yarn/hadoop1.alo.alt@ALO.ALT
addprinc -randkey hbase/hadoop1.alo.alt@ALO.ALT
addprinc -randkey HTTP/hadoop1.alo.alt@ALO.ALT
addprinc <USERNAME>@ALO.ALT
Your user principal will require a password. After creation, test authentication using:
su - <USERNAME> && kinit
3. Export Keytabs for Hadoop Services
xst -norandkey -k hdfs.keytab hdfs/hadoop1.alo.alt@ALO.ALT HTTP/hadoop1.alo.alt@ALO.ALT
xst -norandkey -k mapred.keytab mapred/hadoop1.alo.alt@ALO.ALT HTTP/hadoop1.alo.alt@ALO.ALT
xst -norandkey -k yarn.keytab yarn/hadoop1.alo.alt@ALO.ALT HTTP/hadoop1.alo.alt@ALO.ALT
Fix permissions and deploy them to the proper Hadoop configuration directory:
chown hdfs:hadoop hdfs.keytab && chmod 400 hdfs.keytab
chown mapred:hadoop mapred.keytab && chmod 400 mapred.keytab
chown yarn:hadoop yarn.keytab && chmod 400 yarn.keytab
4. Enable Kerberos in HDFS
Update hdfs-site.xml:
dfs.block.access.token.enable = true
dfs.namenode.keytab.file = <PATH>/hdfs.keytab
dfs.namenode.kerberos.principal = hdfs/_HOST@ALO.ALT
dfs.namenode.kerberos.internal.spnego.principal = HTTP/_HOST@ALO.ALT
dfs.secondary.namenode.keytab.file = <PATH>/hdfs.keytab
dfs.secondary.namenode.kerberos.principal = hdfs/_HOST@ALO.ALT
dfs.datanode.keytab.file = <PATH>/hdfs.keytab
dfs.datanode.kerberos.principal = hdfs/_HOST@ALO.ALT
dfs.web.authentication.kerberos.principal = HTTP/_HOST@ALO.ALT
dfs.web.authentication.kerberos.keytab = <PATH>/hdfs.keytab
Start the NameNode and validate access:
hadoop fs -ls /
Set correct permissions for temporary directories:
sudo -u hdfs kinit -k -t hdfs.keytab hdfs/hadoop1.alo.alt@ALO.ALT
sudo -u hdfs hadoop fs -chmod 1777 /tmp
5. Enable Kerberos in MapReduce and YARN
Example properties for mapred-site.xml:
mapreduce.jobtracker.kerberos.principal = mapred/_HOST@ALO.ALT
mapreduce.jobtracker.keytab.file = <PATH>/mapred.keytab
mapreduce.tasktracker.kerberos.principal = mapred/_HOST@ALO.ALT
mapreduce.tasktracker.keytab.file = <PATH>/mapred.keytab
Ensure the TaskTracker or NodeManager uses the correct group and directories:
# /etc/hadoop/conf/taskcontroller.cfg
hadoop.log.dir=/var/log/hadoop-mapreduce/
mapred.local.dir=/opt/hadoop/hdfs/mapred/local
mapreduce.tasktracker.group=mapred
banned.users=mapred,hdfs,bin
min.user.id=500
6. Validate the Secure Cluster
Restart the YARN and MapReduce daemons, authenticate using kinit, and run a simple job (e.g., the PI example) to confirm that secure submission works end-to-end. Use klist to verify ticket validity.
For more advanced Kerberos trust and production hardening techniques, refer to: Hadoop and Trusted MITv5 Kerberos.
If you need help with distributed systems, backend engineering, or data platforms, check my Services.