Skip to main content

Posts

Showing posts from April, 2015

How Spark Integrates with Hive Today (and Why Early CDH Versions Required Manual Setup)

Modern Spark integrates with Hive through the SparkSession catalog, allowing unified access to Hive tables without manual classpath or configuration hacks. Earlier CDH 5.x deployments required copying hive-site.xml and adjusting classpaths because Hive on Spark was not fully supported. This updated guide explains the current approach and provides historical context for engineers maintaining legacy clusters. In modern Hadoop and Spark deployments, Spark connects to Hive through the SparkSession catalog. Hive metastore integration is stable, supported and no longer requires manual configuration steps such as copying hive-site.xml or modifying executor classpaths. Using Hive from Spark Today Create a SparkSession with Hive support enabled: val spark = SparkSession.builder() .appName("SparkHive") .enableHiveSupport() .getOrCreate() Once enabled, Spark can query Hive tables directly: spark.sql("SELECT COUNT(*) FROM sample_07").show() Spark hand...