Skip to main content


Showing posts from April, 2015

Hive on Spark at CDH 5.3

However, since Hive on Spark is not (yet) officially supported by Cloudera some manual steps are required to get Hive on Spark within CDH 5.3 working. Please note that there are four important requirements additionally to the hands-on work: Spark Gateway nodes needs to be a Hive Gateway node as well In case the client configurations are redeployed, you need to copy the hive-site.xml again In case CDH is upgraded (also for minor patches, often updated without noticing you), you need to adjust the class paths Hive libraries need to be present on all executors (CM should take care of this automatically) Login to your spark server(s) and copy the running hive-site.xml to spark: cp /etc/hive/conf/hive-site.xml /etc/spark/conf/ Start your spark shell with (replace <CDH_VERSION> with your parcel version, e.g. 5.3.2-1.cdh5.3.2.p0.10 ) and load the hive context within spark-shell: spark-shell --master yarn-client --driver-class-path "/opt/cloudera/parcels/CDH-<CDH_VERSIO