Using Hive’s HBaseStorageHandler with Existing HBase Tables

Hive’s HBaseStorageHandler lets you expose HBase tables as Hive tables so you can run SELECT and INSERT statements over HBase data. This article shows how to configure Hive to talk to HBase, create a new HBase table through Hive and attach Hive EXTERNAL TABLE definitions to existing HBase tables using the correct column family mappings. It reflects a Hive 0.9 / HBase 0.92 era setup and is mainly useful for legacy clusters and migrations.

Note (2025): This guide describes the original HBase integration introduced around HIVE-705 for Hive 0.9 and HBase 0.92. Modern Hadoop stacks often favour storing analytics data in Parquet or Iceberg and querying it via engines like Hive, Impala, Trino or Spark. Use this pattern primarily when you need to understand or maintain existing Hive-on-HBase tables, not for new designs.

Hive–HBase integration in a nutshell

Hive can read from and write to HBase tables using a dedicated storage handler. Once configured, you can:

Run SELECT queries over HBase-backed tables.
Use INSERT to write data into HBase via Hive.
Expose existing HBase tables to Hive using CREATE EXTERNAL TABLE.

The feature is implemented by HBaseStorageHandler and was introduced in HIVE-705. At the time of writing, Hive 0.9 required HBase 0.92 or newer for this integration to work.

Prerequisites and storage handler

The HBase storage handler ships with Hive and should be available in the Hive library directory:

$HIVE_HOME/lib/hive-hbase-handler*.jar

In that era, the handler required:

Hadoop 0.20.x or later
Zookeeper 3.3.4 or later
Matching HBase and Hive versions (e.g. HBase 0.92 with Hive 0.9)

Configuring Hive to see HBase

Hive needs to know where to find the HBase configuration so that it can locate Zookeeper, the HBase master and region servers. One simple way is to add HBase’s configuration directory to hive-site.xml:

<property>
  <name>hive.aux.jars.path</name>
  <value>file:///etc/hbase/conf</value>
</property>

After editing hive-site.xml, distribute:

hbase-site.xml
hive-site.xml

to all Hive clients and nodes that will run HiveServer or Hive CLI. This ensures that all Hive components resolve the same HBase/Zookeeper configuration.

Create a new HBase table from Hive

You can define a Hive table that is backed by a new HBase table using HBaseStorageHandler and the hbase.columns.mapping property. For example:

CREATE TABLE hbase_test (
  key1 STRING,
  col1 STRING
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
  "hbase.columns.mapping" = ":key,cf1:c1"
)
TBLPROPERTIES (
  "hbase.table.name" = "hive_test"
);

This statement tells Hive to:

Create an HBase table named hive_test if it does not exist.
Map:
- key1 to the HBase row key (denoted by :key).
- col1 to column c1 in column family cf1.

In HBase shell, the table then looks like:

hbase(main):001:0> describe 'hive_test'
DESCRIPTION                                                                 ENABLED
{NAME => 'hive_test',
 FAMILIES => [{NAME => 'cf1', BLOOMFILTER => 'NONE',
 REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE',
 MIN_VERSIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536',
 IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}                             true
1 row(s) in 0.1190 seconds

From here, you can run Hive queries like SELECT key1, col1 FROM hbase_test and Hive will read directly from HBase.

Attaching Hive to an existing HBase table

More often, you already have data in HBase and want to query it from Hive. In that case, you use CREATE EXTERNAL TABLE so Hive does not own the HBase table lifecycle.

Inspect the HBase schema first

Start by inspecting the HBase table to find its column families and qualifiers:

hbase(main):003:0> describe 't1'
DESCRIPTION                                                                 ENABLED
{NAME => 't1',
 FAMILIES => [{NAME => 'f1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0',
 COMPRESSION => 'NONE', VERSIONS => '1', TTL => '2147483647',
 MIN_VERSIONS => '0', BLOCKSIZE => '65536', IN_MEMORY => 'false',
 BLOCKCACHE => 'true'}]}                                                    true
1 row(s) in 0.0700 seconds

In this example, the table t1 has a single column family f1. We’ll map a Hive column to one of its qualifiers, for example f1:c1.

Create an EXTERNAL table in Hive

Hive does not support ALTER for non-native (HBase-backed) tables in this old integration. Instead, you define an EXTERNAL table from the start:

CREATE EXTERNAL TABLE hbase_test2 (
  key1 STRING,
  col1 STRING
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
  "hbase.columns.mapping" = ":key,f1:c1"
)
TBLPROPERTIES (
  "hbase.table.name" = "t1"
);

Here we:

Map key1 to the HBase row key (:key).
Map col1 to column c1 in family f1.
Tell Hive the underlying HBase table is t1 via hbase.table.name.

From Hive’s perspective, the table looks like a regular two-column table:

hive> describe hbase_test2;
OK
key1    string  from deserializer
col1    string  from deserializer
Time taken: 0.106 seconds

You can now run SELECT queries on hbase_test2, and Hive will read from the existing HBase table t1. Depending on your setup, INSERT operations can also write into HBase through this mapping.

Things to keep in mind

This integration is tied to specific Hive/HBase versions (Hive 0.9 with HBase 0.92 in this example).
Schema evolution is limited: non-native tables cannot be freely altered.
Performance depends heavily on HBase table design (row key, regions, compression and block cache).
For new workloads, prefer modern storage formats (Parquet/Iceberg) and engines that optimize for analytical queries.

If you still run legacy Hive-on-HBase tables, this pattern provides a clear way to expose them to SQL without rewriting your storage layer immediately. It also helps when planning a migration by making HBase data visible in SQL for validation and export.

Related guides:

Iceberg data platform architecture

If you need help with distributed systems, backend engineering, or data platforms, check my Services.

novatechflow | Alexander Alten

Search This Blog