HBASE过滤器介绍：

所有的过滤器都在服务端生效，叫做谓语下推(predicate push down),这样可以保证被过滤掉的数据不会被传送到客户端。

注意：

基于字符串的比较器，如RegexStringComparator和SubstringComparator，比基于字节的比较器更慢，更消耗资源。因为每次比较时它们都需要将给定的值转化为String.截取字符串子串和正则式的处理也需要花费额外的时间。
过滤器本来的目的是为了筛掉无用的信息，所有基于CompareFilter的过滤处理过程是返回匹配的值。

Interface for row and column filters directly applied within the regionserver. A filter can expect the following call sequence:

reset() : reset the filter state before filtering a new row.
filterAllRemaining(): true means row scan is over; false means keep going.
filterRowKey(byte[],int,int): true means drop this row; false means include.
filterKeyValue(Cell): decides whether to include or exclude this KeyValue. See Filter.ReturnCode.
transform(KeyValue): if the KeyValue is included, let the filter transform the KeyValue.
filterRowCells(List): allows direct modification of the final list to be submitted
filterRow(): last chance to drop entire row based on the sequence of filter calls. Eg: filter a row if it doesn't contain a specified column.

Filter instances are created one per region/scan. This abstract class replaces the old RowFilterInterface. When implementing your own filters, consider inheriting FilterBase to help you reduce boilerplate.

过滤器实例在每次region/scan时被创建，并且使用抽象类代替了原来的接口。如果你需要实现自定义的过滤器，考虑直接继承FilterBase,来避免许多重复的结构代码。

HBase Filter介绍及执行流程

HBASE过滤器介绍：

过滤器执行流程

过滤器属性和它们之间的兼容性

HBase Filter介绍及执行流程

HBASE过滤器介绍：

过滤器执行流程

过滤器属性和它们之间的兼容性

相关推荐