通过水槽将事件数据写入HDFS时出错

问题描述:

我正在使用cdh3更新4 tarball进行开发。我已经运行起来了。现在,我也从cloudera viz 1.1.0下载了等价的flume tarball,并试图使用hdfs-sink将日志文件尾部写入hdfs。当我运行flume agent时,它会开始好,但当它试图将新的事件数据写入hdfs时会出错。我无法找到更好的组来发布此问题比*。 这里是水槽配置我使用通过水槽将事件数据写入HDFS时出错

agent.sources=exec-source 
agent.sinks=hdfs-sink 
agent.channels=ch1 

agent.sources.exec-source.type=exec 
agent.sources.exec-source.command=tail -F /locationoffile 

agent.sinks.hdfs-sink.type=hdfs 
agent.sinks.hdfs-sink.hdfs.path=hdfs://localhost:8020/flume 
agent.sinks.hdfs-sink.hdfs.filePrefix=apacheaccess 

agent.channels.ch1.type=memory 
agent.channels.ch1.capacity=1000 

agent.sources.exec-source.channels=ch1 
agent.sinks.hdfs-sink.channel=ch1 

而且,这是一个被显示在控制台当收到新的事件数据,并试图将其写入到HDFS错误的小片段。

13/03/16 17:59:21 INFO hdfs.BucketWriter: Creating hdfs://localhost:8020/user/hdfs-user/flume/apacheaccess.1363436060424.tmp 
13/03/16 17:59:22 WARN hdfs.HDFSEventSink: HDFS IO error 
java.io.IOException: Failed on local exception: java.io.IOException: Broken pipe; Host Details : local host is: "sumit-HP-Pavilion-dv3-Notebook-PC/127.0.0.1"; destination host is: "localhost":8020; 
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759) 
    at org.apache.hadoop.ipc.Client.call(Client.java:1164) 
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) 
    at $Proxy9.create(Unknown Source) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
    at java.lang.reflect.Method.invoke(Method.java:597) 
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) 
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) 
    at $Proxy9.create(Unknown Source) 
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:192) 
    at org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1298) 
    at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1317) 
    at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1215) 
    at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1173) 
    at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:272) 
    at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:261) 
    at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:78) 
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:805) 
    at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1060) 
    at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:270) 
    at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:369) 
    at org.apache.flume.sink.hdfs.HDFSSequenceFile.open(HDFSSequenceFile.java:65) 
    at org.apache.flume.sink.hdfs.HDFSSequenceFile.open(HDFSSequenceFile.java:49) 
    at org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:190) 
    at org.apache.flume.sink.hdfs.BucketWriter.access$000(BucketWriter.java:50) 
    at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:157) 
    at org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:154) 
    at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:127) 
    at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:154) 
    at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:316) 
    at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:718) 
    at org.apache.flume.sink.hdfs.HDFSEventSink$1.call(HDFSEventSink.java:715) 
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:138) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
    at java.lang.Thread.run(Thread.java:662) 
Caused by: java.io.IOException: Broken pipe 
    at sun.nio.ch.FileDispatcher.write0(Native Method) 
    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29) 
    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:100) 
    at sun.nio.ch.IOUtil.write(IOUtil.java:71) 
    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) 
    at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:62) 
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:143) 
    at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:153) 
    at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:114) 
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) 
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) 
    at java.io.DataOutputStream.flush(DataOutputStream.java:106) 
    at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:861) 
    at org.apache.hadoop.ipc.Client.call(Client.java:1141) 
    ... 37 more 
13/03/16 17:59:27 INFO hdfs.BucketWriter: Creating hdfs://localhost:8020/user/hdfs-user/flume/apacheaccess.1363436060425.tmp 
13/03/16 17:59:27 WARN hdfs.HDFSEventSink: HDFS IO error 
java.io.IOException: Failed on local exception: java.io.IOException: Broken pipe; Host Details : local host is: "sumit-HP-Pavilion-dv3-Notebook-PC/127.0.0.1"; destination host is: "localhost":8020; 

随着人们Cloudera的邮件列表suggest,有此错误的可能的理由:

  1. HDFS的安全模式已开启。尝试运行hadoop fs -safemode leave并查看错误是否消失。
  2. Flume和Hadoop版本不匹配。要检查这个问题,请将hadoop-core.jar替换为hadoop安装文件夹中的hadoop-core.jar。
+0

该命令实际上是'hadoop dfsadmin -safemode leave'。在我看来,这是问题的瓶子。 – maksimov 2013-07-16 16:28:25