带Hive的NiFi PutHiveStreaming处理器:连接到EndPoint失败

带Hive的NiFi PutHiveStreaming处理器:连接到EndPoint失败

问题描述:

有人会用Nifi 1.3.0和Hive帮助解决这个问题。我发现与hive 1.2和Hive 2.1.1相同的错误。配置单元表格是分区,分区并存储为ORC格式。带Hive的NiFi PutHiveStreaming处理器:连接到EndPoint失败

该分区是在hdfs上创建的,但写入阶段的数据失败。请检查日志如下:

[5:07 AM] papesdiop: Failed connecting to EndPoint {metaStoreUri='thrift://localhost:9083', database='mydb', table='guys', partitionVals=[dev] } 
[5:13 AM] papesdiop: I get in log see next, hope it might help too: 
[5:13 AM] papesdiop: Caused by: org.apache.hive.hcatalog.streaming.TransactionError: Unable to acquire lock on {metaStoreUri='thrift://localhost:9083', database='mydb', table='guys', partitionVals=[dev] } 
  at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.beginNextTransactionImpl(HiveEndPoint.java:578) 

FULL跟踪日志:

重新连接。 org.apache.thrift.transport.TTransportException:null at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java: 86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift。 (org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore $ Client.recv_lock(TServiceClient.java:69) ThriftHiveMetastore.java:3906) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore $ Client。锁(ThriftHiveMetastore.java:3893) 在org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:1863) 在sun.reflect.GeneratedMethodAccessor380.invoke在sun.reflect.DelegatingMethodAccessorImpl(未知来源) .invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:152) 在com.sun.proxy。$ Proxy126.lock(Unknown Source) at org.apache.hive.hcatalog.streaming.HiveEndPoint $ TransactionBatchImpl.beginNextTransactionImpl(HiveEndPoint.java:573) at org.apache.hive.hcatalog.streaming .HiveEndPoint $ TransactionBatchImpl.beginNextTransaction(HiveEndPoint.java:547) at org.apache.nifi.util.hive.HiveWriter.nextTxnBatch(HiveWriter.java:261) at org.apache.nifi.util.hive.HiveWriter。(HiveWriter.java:73) at org.apache.nifi .util.hive.HiveUtils.makeHiveWriter(HiveUtils.java:46) at org.apache.nifi.processors.hive.PutHiveStreaming.makeHiveWriter(PutHiveStreaming.java:964) at org.apache.nifi.processors.hive.PutHiveStreaming .getOrCreateWriter(PutHiveStreaming.java:875) at org.apache.nifi.processors.hive.PutHiveStreaming.lambda $ null $ 40(PutHiveStreaming.java:676) at org.apache.nifi.processor.util.pattern.ExceptionHandler。执行(ExceptionHandler.java:127) at org.apache.nifi.processors.hive.PutHiveStreaming.lambda $ onTrigger $ 44(PutHiveStreaming.java:673) at org.apache.nifi.controller.repo sitory.StandardProcessSession.read(StandardProcessSession.java:2136) 在org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2106) 在org.apache.nifi.processors.hive.PutHiveStreaming.onTrigger( PutHiveStreaming.java:627) at org.apache.nifi.processors.hive.PutHiveStreaming.lambda $ onTrigger $ 36(PutHiveStreaming.java:551) at org.apache.nifi.processor.util.pattern.PartialFunctions.onTrigger(PartialFunctions .java:114) at org.apache.nifi.processor.util.pattern.RollbackOnFailure.onTrigger(RollbackOnFailure.java:184) at org.apache.nifi.processors。hive.PutHiveStreaming.onTrigger(PutHiveStreaming.java:551) at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1120) at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuingRunProcessorTask。 Java的:147) 在org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) 在org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent $ 1.run(TimerDrivenSchedulingAgent.java:132) 在java.util.concurrent.Executors $ RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor $ ScheduledFutureTask.access $ 301(ScheduledThreadPoolExecutor.java:180) 在java.util.concurrent.ScheduledThreadPoolExecutor $ ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor $ Worker。运行(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2017-09-07 06:41:31,015 DEBUG [Timer-4] oanprocessors.hive.PutHiveStreaming PutHiveStreaming [id = 13ed53d2-015e-1000-c7b1-5af434c38751]开始在所有作家发送心跳 2017-09-07 06:41:31,890 INFO [定时器驱动的进程线程-3] hive.metastore尝试使用URI thrift连接到Metastore: // localhost:9083 2017-09-07 06:41:31,893 INFO [Timer-Driven Process Thread-3] hive.metastore连接到Metastore。 2017-09-07 06:41:31,911错误[定时器驱动的进程线程3] oanprocessors.hive.PutHiveStreaming PutHiveStreaming [id = 13ed53d2-015e-1000-c7b1-5af434c38751]无法为端点创建HiveWriter:{metaStoreUri ='thrift:// localhost:9083',database ='default',table ='guys',partitionVals = [dev]}:org.apache.nifi.util.hive.HiveWriter $ ConnectFailure:无法连接到EndPoint {metaStoreUri ='thrift:// localhost:9083',database ='default',table ='guys',partitionVals = [dev]} org.apache.nifi.util.hive.HiveWriter $ ConnectFailure:无法连接到EndPoint {metaStoreUri ='thrift:// localhost:9083',database ='default',table ='guys',partitionVals = [dev]} at org.apache.nifi.util.hive.HiveWriter。(HiveWriter.java:79) at org.apache.nifi.util.hive.HiveUtils.makeHiveWriter(HiveUtils.java:46) at org.apache.nifi.processors.hive.PutHiveSt regan.makeHiveWriter(PutHiveStreaming.java:964) at org.apache.nifi.processors.hive.PutHiveStreaming.getOrCreateWriter(PutHiveStreaming.java:875) at org.apache.nifi.processors.hive.PutHiveStreaming.lambda $ null $ 40 (PutHiveStreaming.java:676) at org.apache.nifi.processor.util.pattern.ExceptionHandler.execute(ExceptionHandler.java:127) at org.apache.nifi.processors.hive.PutHiveStreaming.lambda $ onTrigger $ 44( PutHiveStreaming.java:673) 在org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2136) 在org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2106) at org.apache.nifi.processors.hive.PutHiveStreaming.onTrigger(PutHiveStreaming.java:627) at org。 apache.nifi.processors.hive.PutHiveStreaming.lambda $ onTrigger $ 36(PutHiveStreaming.java:551) at org.apache.nifi.processor.util.pattern.PartialFunctions.onTrigger(PartialFunctions.java:114) at org.apache .nifi.processor.util.pattern.RollbackOnFailure.onTrigger(RollbackOnFailure.java:184) at org.apache.nifi.processors.hive.PutHiveStreaming.onTrigger(PutHiveStreaming.java:551) at org.apache.nifi.controller .StandardProcessorNode.onTrigger(StandardProcessorNode.java:1120) at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuingRunProcessorTask.java:147) at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuingRunProcessorTask .java:47) at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent $ 1.run(TimerDrivenS chedulingAgent。java:132) at java.util.concurrent.Executors $ RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent。的ScheduledThreadPoolExecutor $ ScheduledFutureTask.access $ 301(ScheduledThreadPoolExecutor.java:180) 在java.util.concurrent.ScheduledThreadPoolExecutor中的$ ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) 在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149 ) at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 引起人:org.apache.nifi.util.hive .HiveWriter $ TxnBatchFailure:从EndPoint获取交易批次失败:{metaStoreUri ='thrift:// loc alhost:9083' ,数据库= '默认',表= '人',partitionVals = [dev的]} 在org.apache.nifi.util.hive.HiveWriter.nextTxnBatch(HiveWriter.java:264) 在org.apache 。.nifi.util.hive.HiveWriter(HiveWriter.java:73) ... 24个共同帧省略 所致:org.apache.hive.hcatalog.streaming.TransactionError:无法在{metaStoreUri ='节俭获取锁://本地主机:9083' ,数据库= '默认',表= '人',partitionVals = [dev的]} 在org.apache.hive.hcatalog.streaming.HiveEndPoint $ TransactionBatchImpl.beginNextTransactionImpl(HiveEndPoint.java:578) at org.apache.hive.hcatalog.streaming.HiveEndPoint $ TransactionBatchImpl.beginNextTransaction(HiveEndPoint.java:547) at org.apache.nifi.util.hive.HiveWriter.nextTxnBatch(HiveWriter.java:261) ... 25在org.apache.thrift.transport.TTransport在org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)空 :公共帧省略 所致:org.apache.thrift.transport.TTransportException。 readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api。 ThriftHiveMetastore $ Client.recv_lock(ThriftHiveMetastore.java:3906) at org.apache.hadoop.hive.metasto re.api.ThriftHiveMetastore $ Client.lock(ThriftHiveMetastore.java:3893) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:1863) at sun.reflect.GeneratedMethodAccessor380.invoke(Unknown Source ) 在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 在java.lang.reflect.Method.invoke(Method.java:498) 在org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke (RetryingMetaStoreClient.java:152) at com.sun.proxy。$ Proxy126.lock(Unknown Source) at org.apache.hive.hcatalog.streaming.HiveEndPoint $ TransactionBatchImpl.beginNextTransactionImpl(HiveEndPoint.java:573) .. 。共有27个框架遗漏了 2017-09-07 06:41:31,911错误[Timer-D里文进程线程-3] oanprocessors.hive.PutHiveStreaming PutHiveStreaming [ID = 13ed53d2-015e-1000-c7b1-5af434c38751]连接蜂巢端点错误:在旧货表球员://本地主机:9083 2017年9月7日06: 41:31911 DEBUG [定时器驱动进程线程-3] oanprocessors.hive.PutHiveStreaming PutHiveStreaming [ID = 13ed53d2-015e-1000-c7b1-5af434c38751]已选择以得到它的资源;不会被调度到再次1000毫秒 2017年9月7日06运行:41:31912 ERROR [定时器驱动进程线程-3] oanprocessors.hive.PutHiveStreaming PutHiveStreaming [ID = 13ed53d2-015e-1000-c7b1-5af434c38751 ] Hive流连接/写入错误,流文件将受到处罚并路由重试。 org.apache.nifi.util.hive。HiveWriter $ ConnectFailure:无法连接到端点{metaStoreUri = '节俭://本地主机:9083',数据库= '默认',表= '家伙',partitionVals =

Hive的表

CREATE TABLE mydb.guys (    firstname string,    lastname string) PARTITIONED BY (    job string) CLUSTERED BY (   firstname) INTO 10 BUCKETS ROW FORMAT SERDE   'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS ORC LOCATION   'hdfs://localhost:9000/user/papesdiop/guys' TBLPROPERTIES ( 'transactional'='true')

在此先感谢

如果在写入HDFS过程中失败,或许您的用户没有写入目标目录的权限?如果您从完整的堆栈跟踪中获得更多信息,请将其添加到您的问题中,因为这有助于诊断问题。前一段时间我遇到这个问题时,是因为我的NiFi用户需要在目标OS上创建并添加到相应的HDFS组中,以便获得PutHiveStreaming写入ORC文件的权限,在HDFS中。

+0

嗨马特, 我对所有服务(hdfs,yarn,hiveserver2,metastore,nifi)使用相同的用户。动态创建hdfs中的分区,但空文件位于delta_0022710_0022714/bucket_00001_flush_length。 Full stacktrace log: – Papesdiop

+0

我编辑了完整跟踪日志的问题。感谢您的帮助 – Papesdiop

+0

您使用Apache NiFi或Hortonworks DataFlow吗?您的集群是Apache Hadoop还是来自供应商的发行版?这个错误意味着存在Thrift不匹配,这通常表示不兼容版本 – mattyb