Skip to main content

Using Hive’s HBaseStorageHandler with Existing HBase Tables

Struggling with delivery, architecture alignment, or platform stability?

I help teams fix systemic engineering issues: processes, architecture, and clarity.
→ See how I work with teams.


Hive’s HBaseStorageHandler lets you expose HBase tables as Hive tables so you can run SELECT and INSERT statements over HBase data. This article shows how to configure Hive to talk to HBase, create a new HBase table through Hive and attach Hive EXTERNAL TABLE definitions to existing HBase tables using the correct column family mappings. It reflects a Hive 0.9 / HBase 0.92 era setup and is mainly useful for legacy clusters and migrations.

Note (2025): This guide describes the original HBase integration introduced around HIVE-705 for Hive 0.9 and HBase 0.92. Modern Hadoop stacks often favour storing analytics data in Parquet or Iceberg and querying it via engines like Hive, Impala, Trino or Spark. Use this pattern primarily when you need to understand or maintain existing Hive-on-HBase tables, not for new designs.

Hive–HBase integration in a nutshell

Hive can read from and write to HBase tables using a dedicated storage handler. Once configured, you can:

  • Run SELECT queries over HBase-backed tables.
  • Use INSERT to write data into HBase via Hive.
  • Expose existing HBase tables to Hive using CREATE EXTERNAL TABLE.

The feature is implemented by HBaseStorageHandler and was introduced in HIVE-705. At the time of writing, Hive 0.9 required HBase 0.92 or newer for this integration to work.

Prerequisites and storage handler

The HBase storage handler ships with Hive and should be available in the Hive library directory:

$HIVE_HOME/lib/hive-hbase-handler*.jar

In that era, the handler required:

  • Hadoop 0.20.x or later
  • Zookeeper 3.3.4 or later
  • Matching HBase and Hive versions (e.g. HBase 0.92 with Hive 0.9)

Configuring Hive to see HBase

Hive needs to know where to find the HBase configuration so that it can locate Zookeeper, the HBase master and region servers. One simple way is to add HBase’s configuration directory to hive-site.xml:

<property>
  <name>hive.aux.jars.path</name>
  <value>file:///etc/hbase/conf</value>
</property>

After editing hive-site.xml, distribute:

  • hbase-site.xml
  • hive-site.xml

to all Hive clients and nodes that will run HiveServer or Hive CLI. This ensures that all Hive components resolve the same HBase/Zookeeper configuration.

Create a new HBase table from Hive

You can define a Hive table that is backed by a new HBase table using HBaseStorageHandler and the hbase.columns.mapping property. For example:

CREATE TABLE hbase_test (
  key1 STRING,
  col1 STRING
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
  "hbase.columns.mapping" = ":key,cf1:c1"
)
TBLPROPERTIES (
  "hbase.table.name" = "hive_test"
);

This statement tells Hive to:

  • Create an HBase table named hive_test if it does not exist.
  • Map:
    • key1 to the HBase row key (denoted by :key).
    • col1 to column c1 in column family cf1.

In HBase shell, the table then looks like:

hbase(main):001:0> describe 'hive_test'
DESCRIPTION                                                                 ENABLED
{NAME => 'hive_test',
 FAMILIES => [{NAME => 'cf1', BLOOMFILTER => 'NONE',
 REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE',
 MIN_VERSIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536',
 IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}                             true
1 row(s) in 0.1190 seconds

From here, you can run Hive queries like SELECT key1, col1 FROM hbase_test and Hive will read directly from HBase.

Attaching Hive to an existing HBase table

More often, you already have data in HBase and want to query it from Hive. In that case, you use CREATE EXTERNAL TABLE so Hive does not own the HBase table lifecycle.

Inspect the HBase schema first

Start by inspecting the HBase table to find its column families and qualifiers:

hbase(main):003:0> describe 't1'
DESCRIPTION                                                                 ENABLED
{NAME => 't1',
 FAMILIES => [{NAME => 'f1', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0',
 COMPRESSION => 'NONE', VERSIONS => '1', TTL => '2147483647',
 MIN_VERSIONS => '0', BLOCKSIZE => '65536', IN_MEMORY => 'false',
 BLOCKCACHE => 'true'}]}                                                    true
1 row(s) in 0.0700 seconds

In this example, the table t1 has a single column family f1. We’ll map a Hive column to one of its qualifiers, for example f1:c1.

Create an EXTERNAL table in Hive

Hive does not support ALTER for non-native (HBase-backed) tables in this old integration. Instead, you define an EXTERNAL table from the start:

CREATE EXTERNAL TABLE hbase_test2 (
  key1 STRING,
  col1 STRING
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
  "hbase.columns.mapping" = ":key,f1:c1"
)
TBLPROPERTIES (
  "hbase.table.name" = "t1"
);

Here we:

  • Map key1 to the HBase row key (:key).
  • Map col1 to column c1 in family f1.
  • Tell Hive the underlying HBase table is t1 via hbase.table.name.

From Hive’s perspective, the table looks like a regular two-column table:

hive> describe hbase_test2;
OK
key1    string  from deserializer
col1    string  from deserializer
Time taken: 0.106 seconds

You can now run SELECT queries on hbase_test2, and Hive will read from the existing HBase table t1. Depending on your setup, INSERT operations can also write into HBase through this mapping.

Things to keep in mind

  • This integration is tied to specific Hive/HBase versions (Hive 0.9 with HBase 0.92 in this example).
  • Schema evolution is limited: non-native tables cannot be freely altered.
  • Performance depends heavily on HBase table design (row key, regions, compression and block cache).
  • For new workloads, prefer modern storage formats (Parquet/Iceberg) and engines that optimize for analytical queries.

If you still run legacy Hive-on-HBase tables, this pattern provides a clear way to expose them to SQL without rewriting your storage layer immediately. It also helps when planning a migration by making HBase data visible in SQL for validation and export.

If you need help with distributed systems, backend engineering, or data platforms, check my Services.

Most read articles

Why Is Customer Obsession Disappearing?

Many companies trade real customer-obsession for automated, low-empathy support. Through examples from Coinbase, PayPal, GO Telecommunications and AT&T, this article shows how reliance on AI chatbots, outsourced call centers, and KPI-driven workflows erodes trust, NPS and customer retention. It argues that human-centric support—treating support as strategic investment instead of cost—is still a core growth engine in competitive markets. It's wild that even with all the cool tech we've got these days, like AI solving complex equations and doing business across time zones in a flash, so many companies are still struggling with the basics: taking care of their customers. The drama around Coinbase's customer support is a prime example of even tech giants messing up. And it's not just Coinbase — it's a big-picture issue for the whole industry. At some point, the idea of "customer obsession" got replaced with "customer automation," and no...

How to scale MySQL perfectly

When MySQL reaches its limits, scaling cannot rely on hardware alone. This article explains how strategic techniques such as caching, sharding and operational optimisation can drastically reduce load and improve application responsiveness. It outlines how in-memory systems like Redis or Memcached offload repeated reads, how horizontal sharding mechanisms distribute data for massive scale, and how tools such as Vitess, ProxySQL and HAProxy support routing, failover and cluster management. The summary also highlights essential practices including query tuning, indexing, replication and connection management. Together these approaches form a modern DevOps strategy that transforms MySQL from a single bottleneck into a resilient, scalable data layer able to grow with your application. When your MySQL database reaches its performance limits, vertical scaling through hardware upgrades provides a temporary solution. Long-term growth, though, requires a more comprehensive approach. This invo...

What the Heck is Superposition and Entanglement?

This post is about superposition and interference in simple, intuitive terms. It describes how quantum states combine, how probability amplitudes add, and why interference patterns appear in systems such as electrons, photons and waves. The goal is to give a clear, non mathematical understanding of how quantum behavior emerges from the rules of wave functions and measurement. If you’ve ever heard the words superposition or entanglement thrown around in conversations about quantum physics, you may have nodded politely while your brain quietly filed them away in the "too confusing to deal with" folder.  These aren't just theoretical quirks; they're the foundation of mind-bending tech like Google's latest quantum chip, the Willow with its 105 qubits. Superposition challenges our understanding of reality, suggesting that particles don't have definite states until observed. This principle is crucial in quantum technologies, enabling phenomena like quantum comp...