Hive LDAP身份认证及Sentry权限配置

参考文档:https://www.cloudera.com/documentation/enterprise/5-10-x/topics/cdh_sg_hiveserver2_security.html#topic_9_1

CDH5.7之后HIVE同时支持kerberos和LDAP 2种认证,带来的好处毋庸置疑. Kerberos负责集群的安全性,LDAP使用用户和密码给HIVE, IMPALA等使用,相比使用keytab要简单。

Enabling LDAP Authentication with HiveServer2 using OpenLDAP

<property>
  <name>hive.server2.authentication</name>
  <value>LDAP</value>
</property>
<property>
  <name>hive.server2.authentication.ldap.url</name>
  <value>LDAP_URL</value>
</property>
<property>
  <name>hive.server2.authentication.ldap.baseDN</name>
  <value>LDAP_BaseDN</value>
</property>

只需要将上述3个参数添加到hive-site.xml即可打开LDAP认证,非常简单。

Hive LDAP身份认证及Sentry权限配置

重启服务测试JDBC,我的服务开启了kerberos,又打开了LDAP认证,接下来同时测试一下:

String url = "jdbc:hive2://node1:10000/default;user=LDAP_Userid;password=LDAP_Password"
Connection con = DriverManager.getConnection(url);

LDAP认证:

[[email protected] ~]# beeline
Beeline version 1.1.0-cdh5.10.2 by Apache Hive
beeline> !connect jdbc:hive2://10.40.2.93:10000/default;user=jlwang2;password=wangjialong
scan complete in 1ms
Connecting to jdbc:hive2://10.40.2.93:10000/default;user=jlwang2;password=wangjialong
Connected to: Apache Hive (version 1.1.0-cdh5.10.2)
Driver: Hive JDBC (version 1.1.0-cdh5.10.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://10.40.2.93:10000/default> show databases;
INFO  : Compiling command(queryId=hive_20190411091818_089e15d1-08b1-4eed-a909-d8eee32be60a): show databases
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=hive_20190411091818_089e15d1-08b1-4eed-a909-d8eee32be60a); Time taken: 0.055 seconds
INFO  : Executing command(queryId=hive_20190411091818_089e15d1-08b1-4eed-a909-d8eee32be60a): show databases
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20190411091818_089e15d1-08b1-4eed-a909-d8eee32be60a); Time taken: 0.13 seconds
INFO  : OK

Kerberos认证:

[[email protected] ~]# beeline
Beeline version 1.1.0-cdh5.10.2 by Apache Hive
beeline> !connect jdbc:hive2://tsczbdnndev1.trinasolar.com:10000/default;principal=hive/[email protected]
scan complete in 1ms
Connecting to jdbc:hive2://tsczbdnndev1.trinasolar.com:10000/default;principal=hive/[email protected]
Connected to: Apache Hive (version 1.1.0-cdh5.10.2)
Driver: Hive JDBC (version 1.1.0-cdh5.10.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://tsczbdnndev1.trinasolar.com:1> 
0: jdbc:hive2://tsczbdnndev1.trinasolar.com:1> show databases;
INFO  : Compiling command(queryId=hive_20190411092323_4938457f-866b-4f72-97c6-cf5e69530144): show databases
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=hive_20190411092323_4938457f-866b-4f72-97c6-cf5e69530144); Time taken: 0.046 seconds
INFO  : Executing command(queryId=hive_20190411092323_4938457f-866b-4f72-97c6-cf5e69530144): show databases
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20190411092323_4938457f-866b-4f72-97c6-cf5e69530144); Time taken: 0.078 seconds
INFO  : OK

以上LDAP认证就完成了,接下来看一下Sentry权限设置。

参考文档:https://www.cloudera.com/documentation/enterprise/5-10-x/topics/cm_sg_sentry_service.html

在设置sentry之前,我们要先了解一个非常重要的东西,那就是 user to group mapping, 读明白下面几句话,就应该了解sentry 用户及用户组的管理。

You can configure Sentry to use Hadoop groups. By default, Sentry looks up groups locally, but it can be configured to look up Hadoop groups using LDAP (for Active Directory). User/group information for Sentry, Hive and Impala must be made available for lookup on the following hosts:
Sentry - Groups are looked up on the host the Sentry Server runs on.
Hive - Groups are looked up on the hosts running HiveServer2 and the Hive Metastore.
Impala - Groups are looked up on the Catalog Server and on all of the Impala daemon hosts.

CDH默认会新建很多用户, 这些用户及组全部是通过操作系统层面来管理的,Cloudera推荐我们使用LDAP来管理所有用户及组.

参考文档:https://www.cloudera.com/documentation/enterprise/5-10-x/topics/cm_sg_ldap_grp_mappings.html

但是如果全部使用LDAP,我们还需要把CDH自带的用户和组也要重新新建,感觉有点破坏原始系统的感觉,从我个人偏好,底层仍然使用OS来管理用户和组,仅仅像HIVE, IMPALA这些应用使用LDAP来管理用户密码及登入认证。

上面几段话的意思我整理一下:

1. 所有的组必须在sentry服务器上要有

2.hive用户和组,必须在hive相关的服务器上

3.impala用户和组,必须在impala相关服务器上

综合上面的意思及实践:

所有的组,在sentry, hive, impala 全部建立。 对于用户hive,impala服务器上全部建立,这样即肯定没有问题。

sentry赋权具体参考文档:https://www.cloudera.com/documentation/enterprise/5-10-x/topics/sg_hive_sql.html