PIG无法读取导致作业失败的本地CSV
问题描述:
相对较新的猪/ hadoop生态系统,并尝试执行一个简单的DUMP时遇到一个令人沮丧的问题。我正试图调用下面的猪脚本(该文件是本地的,而不是HFDS,所以我使用pig -x local
打开猪壳)。PIG无法读取导致作业失败的本地CSV
REGISTER utils.py USING jython AS utils;
events = LOAD '../test/events.csv' USING PigStorage(',') AS (patientid:int, eventid:chararray, eventdesc:chararray, timestamp:chararray, value:float);
events = FOREACH events GENERATE patientid, eventid, ToDate(timestamp, 'yyyy-MM-dd') AS etimestamp, value;
DUMP events;
但是,这样做的时候,我收到以下错误消息(下面失败的工作摘要,完整的PIG堆栈跟踪底部):
Input(s): Failed to read data from "file:///bootcamp/test/events.csv"
Output(s): Failed to produce result in "file/tmp/temp/305054006/tmp-908064458"
猪堆栈跟踪:
ERROR 1066: Unable to open iterator for alias events. Backend error : java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias events. Backend error : java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
at org.apache.pig.PigServer.openIterator(PigServer.java:925)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:746)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
at org.apache.pig.Main.run(Main.java:558)
at org.apache.pig.Main.main(Main.java:170)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.getStats(MapReduceLauncher.java:822)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:452)
at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:280)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1390)
at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1375)
at org.apache.pig.PigServer.storeEx(PigServer.java:1034)
at org.apache.pig.PigServer.store(PigServer.java:997)
at org.apache.pig.PigServer.openIterator(PigServer.java:910)
... 13 more
Caused by: java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
at org.apache.hadoop.mapreduce.Job.ensureState(Job.java:294)
at org.apache.hadoop.mapreduce.Job.getTaskReports(Job.java:540)
at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.getTaskReports(HadoopShims.java:235)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.getStats(MapReduceLauncher.java:801)
...20 more
我已经看到了类似的失败的工作问题,但遗憾的是,我还没有设法寻找到目前为止的解决方案。
编辑:我应该提到,当下面的PIG教程在下面的链接,我遇到了同样的问题。
http://www.sunlab.org/teaching/cse8803/fall2016/lab/hadoop-pig/
答
所以,我发现我能够做“转储”文件如下:
tmp = events 100000; --any int larger than number of rows
dump tmp;
我曾见过这里类似的问题,并能够通过运行来解决作为根。
查看答案,意外发布为评论。 – mongolol