Skip to main content

Posts

Showing posts from February, 2015

Setting Up MIT Kerberos ↔ Active Directory Cross-Realm Trust for Secure Hadoop Clusters

This post explains how to configure a secure cross-realm Kerberos trust between a MIT KDC and Active Directory for Hadoop environments. It covers modern Kerberos settings, realm definitions, encryption choices, KDC configuration, AD trust creation, and Hadoop’s auth_to_local mapping rules. A final section preserves legacy compatibility for older Windows Server versions, ensuring the article can be used across mixed enterprise environments. Integrating Hadoop with enterprise identity systems often requires establishing a cross-realm Kerberos trust between a local MIT KDC and an Active Directory (AD) domain. This setup allows Hadoop services to authenticate users from AD while maintaining a separate Hadoop-managed realm. We walk through a full MIT Kerberos ↔ AD trust configuration using a modern setup, while preserving legacy notes for older Windows environments still found in long-lived clusters. Example Realms Replace these with your actual realms and hosts: ALO.LOCA...

The Rise and Fall of SQL-on-Hadoop: What Happened and What Replaced It

SQL-on-Hadoop once promised interactive analytics on distributed storage and transformed early big data architectures. Many engines emerged—Hive, Impala, Drill, Phoenix, Presto, Spark SQL, Kylin, and others—each attempting to bridge the gap between Hadoop’s batch-processing roots and the need for low-latency SQL. This article revisits that era, explains why most of these systems faded, and outlines the modern successors that dominate today’s lakehouse and distributed SQL landscape. The SQL-on-Hadoop Era: What We Learned and What Replaced It In the early 2010s, Apache Hadoop became the backbone of large-scale data processing. As businesses demanded interactive analytics on top of HDFS, a wave of SQL engines emerged. The goal: bring familiar relational querying to a distributed storage layer originally designed for MapReduce batch jobs. By 2015, SQL-on-Hadoop was the hottest category in big data. Today, in 2025, most of those systems have disappeared, evolved, or been replac...