I help teams fix systemic engineering issues: processes, architecture, and clarity.
→ See how I work with teams.
Note (2025): This guide describes the original HBase integration introduced around HIVE-705 for Hive 0.9 and HBase 0.92. Modern Hadoop stacks often favour storing analytics data in Parquet or Iceberg and querying it via engines like Hive, Impala, Trino or Spark. Use this pattern primarily when you need to understand or maintain existing Hive-on-HBase tables, not for new designs.
Hive–HBase integration in a nutshell
Hive can read from and write to HBase tables using a dedicated storage handler. Once configured, you can:
- Run
SELECTqueries over HBase-backed tables. - Use
INSERTto write data into HBase via Hive. - Expose existing HBase tables to Hive using
CREATE EXTERNAL TABLE.
The feature is implemented by HBaseStorageHandler and was introduced in HIVE-705. At the time of writing, Hive 0.9 required HBase 0.92 or newer for this integration to work.
Prerequisites and storage handler
The HBase storage handler ships with Hive and should be available in the Hive library directory:
$HIVE_HOME/lib/hive-hbase-handler*.jar
In that era, the handler required:
- Hadoop 0.20.x or later
- Zookeeper 3.3.4 or later
- Matching HBase and Hive versions (e.g. HBase 0.92 with Hive 0.9)
Configuring Hive to see HBase
Hive needs to know where to find the HBase configuration so that it can locate Zookeeper, the HBase master and region servers. One simple way is to add HBase’s configuration directory to hive-site.xml:
<property>
<name>hive.aux.jars.path</name>
<value>file:///etc/hbase/conf</value>
</property>
After editing hive-site.xml, distribute:
hbase-site.xmlhive-site.xml
to all Hive clients and nodes that will run HiveServer or Hive CLI. This ensures that all Hive components resolve the same HBase/Zookeeper configuration.
Create a new HBase table from Hive
You can define a Hive table that is backed by a new HBase table using HBaseStorageHandler and the hbase.columns.mapping property. For example:
CREATE TABLE hbase_test (
key1 STRING,
col1 STRING
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = ":key,cf1:c1"
)
TBLPROPERTIES (
"hbase.table.name" = "hive_test"
);
This statement tells Hive to:
- Create an HBase table named
hive_testif it does not exist. - Map:
key1to the HBase row key (denoted by:key).col1to columnc1in column familycf1.
In HBase shell, the table then looks like:
hbase(main):001:0> describe 'hive_test'
DESCRIPTION ENABLED
{NAME => 'hive_test',
FAMILIES => [{NAME => 'cf1', BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE',
MIN_VERSIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536',
IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} true
1 row(s) in 0.1190 seconds
From here, you can run Hive queries like SELECT key1, col1 FROM hbase_test and Hive will read directly from HBase.
Attaching Hive to an existing HBase table
More often, you already have data in HBase and want to query it from Hive. In that case, you use CREATE EXTERNAL TABLE so Hive does not own the HBase table lifecycle.
Inspect the HBase schema first
Start by inspecting the HBase table to find its column families and qualifiers:
hbase(main):003:0> describe 't1'
DESCRIPTION ENABLED
{NAME => 't1',
FAMILIES => [{NAME => 'f1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0',
COMPRESSION => 'NONE', VERSIONS => '1', TTL => '2147483647',
MIN_VERSIONS => '0', BLOCKSIZE => '65536', IN_MEMORY => 'false',
BLOCKCACHE => 'true'}]} true
1 row(s) in 0.0700 seconds
In this example, the table t1 has a single column family f1. We’ll map a Hive column to one of its qualifiers, for example f1:c1.
Create an EXTERNAL table in Hive
Hive does not support ALTER for non-native (HBase-backed) tables in this old integration. Instead, you define an EXTERNAL table from the start:
CREATE EXTERNAL TABLE hbase_test2 (
key1 STRING,
col1 STRING
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = ":key,f1:c1"
)
TBLPROPERTIES (
"hbase.table.name" = "t1"
);
Here we:
- Map
key1to the HBase row key (:key). - Map
col1to columnc1in familyf1. - Tell Hive the underlying HBase table is
t1viahbase.table.name.
From Hive’s perspective, the table looks like a regular two-column table:
hive> describe hbase_test2;
OK
key1 string from deserializer
col1 string from deserializer
Time taken: 0.106 seconds
You can now run SELECT queries on hbase_test2, and Hive will read from the existing HBase table t1. Depending on your setup, INSERT operations can also write into HBase through this mapping.
Things to keep in mind
- This integration is tied to specific Hive/HBase versions (Hive 0.9 with HBase 0.92 in this example).
- Schema evolution is limited: non-native tables cannot be freely altered.
- Performance depends heavily on HBase table design (row key, regions, compression and block cache).
- For new workloads, prefer modern storage formats (Parquet/Iceberg) and engines that optimize for analytical queries.
If you still run legacy Hive-on-HBase tables, this pattern provides a clear way to expose them to SQL without rewriting your storage layer immediately. It also helps when planning a migration by making HBase data visible in SQL for validation and export.
If you need help with distributed systems, backend engineering, or data platforms, check my Services.