Hive’s embedded Derby database is fine for local testing, but it breaks down as soon as multiple users and services need to share metadata. This guide shows how to move Hive from the default single-user Derby setup to a shared MySQL metastore: configuring MySQL, creating the Hive schema, wiring Hive to the external database, and distributing drivers and configuration across a Hadoop cluster. Apache Hive provides a SQL-like query language (HiveQL) on top of HDFS, making large-scale data analysis accessible to anyone with SQL experience. By default, however, Hive uses an embedded Derby database for its metastore, which is not suited for multi-user or multi-service environments. To run Hive in a real cluster, you need an external metastore database . The database stores metadata about: Databases and tables Partitions and storage descriptors SerDes, column definitions and privileges This article walks through configuring a MySQL-based Hive metastore , suitable for...
Fractional Chief Architect for Big Data Systems & Distributed Data Processing