I help teams fix systemic engineering issues: processes, architecture, and clarity.
→ See how I work with teams.
HBase is a column-oriented database, storing data by column family and qualifier. When executing a scan, filters help reduce the returned data set to only rows matching specific criteria.
A frequent challenge is filtering on more than one column simultaneously. For example, you may require that two or more specific columns must contain valid values before a row qualifies.
The practical solution is to use multiple SingleColumnValueFilter objects combined in a FilterList. This gives you boolean AND logic across all defined filters.
List<Filter> list = new ArrayList<Filter>(2);
// Filter on family "fam1", qualifier "VALUE1"
Filter filter1 = new SingleColumnValueFilter(
Bytes.toBytes("fam1"),
Bytes.toBytes("VALUE1"),
CompareOp.DOES_NOT_EQUAL,
Bytes.toBytes("DOESNOTEXIST")
);
filter1.setFilterIfMissing(true);
list.add(filter1);
// Filter on family "fam2", qualifier "VALUE2"
Filter filter2 = new SingleColumnValueFilter(
Bytes.toBytes("fam2"),
Bytes.toBytes("VALUE2"),
CompareOp.DOES_NOT_EQUAL,
Bytes.toBytes("DOESNOTEXIST")
);
filter2.setFilterIfMissing(true);
list.add(filter2);
FilterList filterList = new FilterList(list);
Scan scan = new Scan();
scan.setFilter(filterList);
Each SingleColumnValueFilter tests for the presence and validity of a single column.
Using CompareOp.DOES_NOT_EQUAL with a placeholder value such as DOESNOTEXIST is a common way to ensure a column must contain meaningful data.
The call setFilterIfMissing(true) ensures that rows without the column are automatically excluded.
When wrapped inside a FilterList, these filters collectively enforce that all conditions must be satisfied before the row is returned.
You can add as many filters as required. This pattern is still widely used for precise data selection in legacy HBase clusters.
If you need help with distributed systems, backend engineering, or data platforms, check my Services.