Skip to main content

Posts

Showing posts from May, 2013

List Hive Table Sizes in HDFS with a Single Shell Command

This quick tip shows how to list all Hive tables in a database together with their HDFS locations and human-readable sizes using a single bash one-liner. It still works on classic Hive CLI setups and can be adapted easily for Beeline or modern Hive deployments. When you run benchmarks, clean up old data or just want to understand how much space each Hive table consumes, it is useful to see HDFS locations and sizes side by side. Instead of clicking through UIs, you can ask Hive for every table location and then call hdfs dfs -du -h on each path. The Hive + HDFS one-liner The following bash one-liner queries Hive for table locations, extracts the HDFS paths and then prints a human-readable size for each table directory: for file in $(hive -S -e "SHOW TABLE EXTENDED LIKE '\*'" \ | grep "location:" \ | awk 'BEGIN { FS=":" } { printf("hdfs:%s:%s\n",$3,$4) }'); do hdfs dfs -du -h "$file" done Typical outp...

Querying HBase Data with Impala via Hive’s HBaseStorageHandler

This is a legacy but still useful walkthrough that shows how to expose HBase-resident data to Impala by going through Hive’s Metastore and the HBaseStorageHandler. Using US census ZIP code income data, we create an HBase table, map it with an external Hive table, bulk load the CSV data with Pig and finally query it from Impala. The pattern is mainly relevant today if you are keeping old CDH clusters alive or planning a migration away from Impala-on-HBase towards Parquet or Iceberg tables. Note (2025): This article describes an older CDH/Impala/HBase pattern based on Hive’s HBaseStorageHandler . It is useful if you still maintain legacy Impala-on-HBase workloads or need to understand how such systems were wired. For new designs you will usually land data in Parquet or Iceberg tables and query them with Impala, Trino or Spark instead of reading directly from HBase. Context: Impala, Hive Metastore and HBase Impala uses the Hive Metastore Service to discover tables and their un...