novatechflow | Alexander Alten

Posts

Showing posts from December, 2012

Historical Impala Setup on RHEL 6 with Kerberos (Impala 0.3 Era)

This is a historical walkthrough from the early Impala 0.3 beta days, showing how Impala was installed manually on RHEL 6, wired to Hive/HDFS configs, integrated with Kerberos using service principals and keytabs, and started via simple scripts. Treat it as a reference for understanding Impala’s components and legacy Kerberos patterns, not as a modern installation guide. Note (2025): The commands, package names and versions in this article describe an Impala 0.3 beta setup on RHEL/CentOS 6 with Oracle JDK 6 and Cloudera’s early repos. Modern Impala deployments use different packaging, Java versions and security defaults. Use this only for maintaining or understanding legacy CDH-era clusters. What Impala is (in this historical context) Impala provides fast, interactive SQL directly on data stored in Apache Hadoop, primarily HDFS and HBase. It reuses: The Hive Metastore and table metadata Hive-compatible SQL syntax ODBC/JDBC drivers and UI components (e.g. Hue Bee...

Using Hive’s HBaseStorageHandler with Existing HBase Tables

Hive’s HBaseStorageHandler lets you expose HBase tables as Hive tables so you can run SELECT and INSERT statements over HBase data. This article shows how to configure Hive to talk to HBase, create a new HBase table through Hive and attach Hive EXTERNAL TABLE definitions to existing HBase tables using the correct column family mappings. It reflects a Hive 0.9 / HBase 0.92 era setup and is mainly useful for legacy clusters and migrations. Note (2025): This guide describes the original HBase integration introduced around HIVE-705 for Hive 0.9 and HBase 0.92. Modern Hadoop stacks often favour storing analytics data in Parquet or Iceberg and querying it via engines like Hive, Impala, Trino or Spark. Use this pattern primarily when you need to understand or maintain existing Hive-on-HBase tables, not for new designs. Hive–HBase integration in a nutshell Hive can read from and write to HBase tables using a dedicated storage handler. Once configured, you can: Run SELECT que...

Fixing Hanging Hive DROP TABLE on PostgreSQL Metastore

On some older Hive deployments with PostgreSQL as the metastore database, DROP TABLE can hang while PostgreSQL shows UPDATE locks on metastore tables. This often happens when certain privilege tables and indexes were not created correctly during an upgrade or manual schema setup. This note shows a legacy DDL patch you can apply to add the missing tables and indexes so DROP TABLE completes successfully again. Always back up your metastore before running any DDL. Important legacy note: The SQL below matches a specific generation of the Hive metastore schema from around 2013. You should only apply it if you have confirmed that these tables and indexes are missing in your metastore and that the definitions match your Hive version. Always test on a non-production copy of your metastore first. Symptom When using PostgreSQL as the Hive metastore database, a statement like: DROP TABLE xyz; may hang indefinitely. On the PostgreSQL side, you see long-running transactions and loc...

Fixing Hive “Too Many Counters” MapReduce Failures

When Hive queries use many operators, MapReduce can hit its default counter limit and fail with a “Too many counters” exception. This short note explains why it happens (Hive creates multiple counters per operator), how to raise mapreduce.job.counters.max safely, and how to estimate how many operators your query uses with EXPLAIN so you can tune the setting without guessing. Symptom A Hive job fails with an error like: Ended Job = job_xxxxxx with exception 'org.apache.hadoop.mapreduce.counters.LimitExceededException (Too many counters: 201 max=200)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask Intercepting System.exit(1) The query might be complex, but otherwise looks syntactically fine. The problem is not your SQL, it’s the number of counters that MapReduce is willing to track. Why this happens: operators and counters Hive uses counters to track statistics for built-in operators (see the Hive LanguageManual UDF / op...