HiveH行级更新操作 -- 实现update和delete
在hive执行delete和update操作时,报错FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
操作代码:
hive> select * from tablename where id = 10000000;
Query ID = root_20190109131616_5a231979-1726-4bb3-9fb8-839dd8d082d4
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1545619490751_0354, Tracking URL = http://node00:8088/proxy/application_1545619490751_0354/
Kill Command = /opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/hadoop/bin/hadoop job -kill job_1545619490751_0354
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-01-09 13:16:31,446 Stage-1 map = 0%, reduce = 0%
2019-01-09 13:16:37,624 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.23 sec
MapReduce Total cumulative CPU time: 3 seconds 230 msec
Ended Job = job_1545619490751_0354
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 3.23 sec HDFS Read: 1284990 HDFS Write: 137 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 230 msec
OK
10000000 136****94 0 NULL 86 edfccfa84875884f158132ca1b889a15
Time taken: 17.549 seconds, Fetched: 1 row(s)
hive> update tablename set sex = 1 where id = 10000000;
FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
hive> delete from tablename where id = 10000000;
FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
hive>
查询相关资料发现:
必须设置这些配置参数以打开Hive中的事务支持,见hive官网
Minimally, these configuration parameters must be set appropriately to turn on transaction support in Hive:
Client Side
hive.support.concurrency – true
hive.enforce.bucketing – true (Not required as of Hive 2.0)
hive.exec.dynamic.partition.mode – nonstrict
hive.txn.manager – org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
Server Side (Metastore)
hive.compactor.initiator.on – true (See table below for more details)
hive.compactor.worker.threads – a positive number on at least one instance of the Thrift metastore service
修改CDH中Hive的配置文件hive-site.xml
- hive --> 配置 --> 搜索 hive-site
- 服务端配置
<property> <name>hive.compactor.initiator.on</name> <value>true</value> </property> <property> <name>hive.compactor.worker.threads </name> <value>1</value> </property>
- 客户端配置
<property> <name>hive.support.concurrency</name> <value>true</value> </property> <property> <name>hive.enforce.bucketing</name> <value>true</value> </property> <property> <name>hive.exec.dynamic.partition.mode</name> <value>nonstrict</value> </property> <property> <name>hive.txn.manager</name> <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value> </property>
- 重启hive
再次执行update和delete操作,依然报错。
hive> update ods_user set sex = 1 where id = 10000000;
FAILED: SemanticException [Error 10297]: Attempt to do update or delete on table hf_ods.ods_user that does not use an AcidOutputFormat or is not bucketed
hive> delete from ods_user where id = 10000000;
FAILED: SemanticException [Error 10297]: Attempt to do update or delete on table hf_ods.ods_user that does not use an AcidOutputFormat or is not bucketed
hive>
如果一个表要实现update和delete功能,该表就必须支持ACID,而支持ACID,就必须满足以下条件:
- 表的存储格式必须是ORC(STORED AS ORC);
- 表必须进行分桶(CLUSTERED BY (col_name, col_name, …) INTO num_buckets BUCKETS);
- Table property中参数transactional必须设定为True(tblproperties(‘transactional’=‘true’));
修改建表语句:
create table if not exists tablename (
id bigint,
sex tinyint COMMENT '性别',
name String COMMENT '姓名'
) COMMENT '用户表'
partitioned by (year string)
clustered by (id) into 2 buckets
row format delimited fields terminated by '\t'
stored as orc TBLPROPERTIES('transactional'='true');