从本地非Hadoop计算机将数据上传到在Amazon EC2中运行的HDFS

问题描述:

我在Amazon EC2上设置了两个节点的hadoop集群。它运作良好。我可以通过使用hadoop api(附带java程序)将数据从主节点或与hadoop集群相同的Amazon区域中的其他实例上载到HDFS。从本地非Hadoop计算机将数据上传到在Amazon EC2中运行的HDFS

然而,当我想从我的本地非Hadoop的机器做到这一点,它原来有如下例外:

我再登录到Hadoop的NameNode的检查与命令行。创建文件夹“testdir”,但上传文件“myfile”的大小为0.

==================这是分隔符==== ===========================

这些是例外

Apr 18, 2013 10:40:47 AM org.apache.hadoop.hdfs.DFSClient$DFSOutputStream createBlockOutputStream 
INFO: Exception in createBlockOutputStream 10.196.153.215:50010 java.net.ConnectException: Connection timed out 
Apr 18, 2013 10:40:47 AM org.apache.hadoop.hdfs.DFSClient$DFSOutputStream nextBlockOutputStream 
INFO: Abandoning block blk_560654195674249927_1002 
Apr 18, 2013 10:40:47 AM org.apache.hadoop.hdfs.DFSClient$DFSOutputStream nextBlockOutputStream 
INFO: Excluding datanode 10.196.153.215:50010 
Apr 18, 2013 10:41:09 AM org.apache.hadoop.hdfs.DFSClient$DFSOutputStream createBlockOutputStream 
INFO: Exception in createBlockOutputStream 10.195.171.154:50010 java.net.ConnectException: Connection timed out 
Apr 18, 2013 10:41:09 AM org.apache.hadoop.hdfs.DFSClient$DFSOutputStream nextBlockOutputStream 
INFO: Abandoning block blk_1747509888999401559_1002 
Apr 18, 2013 10:41:10 AM org.apache.hadoop.hdfs.DFSClient$DFSOutputStream nextBlockOutputStream 
INFO: Excluding datanode 10.195.171.154:50010 
Apr 18, 2013 10:41:10 AM org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer run 
WARNING: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/ubuntu/testdir/myfile could only be replicated to 0 nodes, instead of 1 
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558) 
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:601) 
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) 
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) 
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) 
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382) 

    at org.apache.hadoop.ipc.Client.call(Client.java:1070) 
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) 
    at $Proxy1.addBlock(Unknown Source) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:601) 
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) 
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) 
    at $Proxy1.addBlock(Unknown Source) 
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3510) 
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3373) 
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2589) 
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2829) 

Apr 18, 2013 10:41:10 AM org.apache.hadoop.hdfs.DFSClient$DFSOutputStream processDatanodeError 
WARNING: Error Recovery for block blk_1747509888999401559_1002 bad datanode[0] nodes == null 
Apr 18, 2013 10:41:10 AM org.apache.hadoop.hdfs.DFSClient$DFSOutputStream processDatanodeError 
WARNING: Could not get block locations. Source file "/user/ubuntu/testdir/myfile" - Aborting... 
Exception in thread "main" org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/ubuntu/testdir/myfile could only be replicated to 0 nodes, instead of 1 
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558) 
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:601) 
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) 
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) 
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) 
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382) 

    at org.apache.hadoop.ipc.Client.call(Client.java:1070) 
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) 
    at $Proxy1.addBlock(Unknown Source) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:601) 
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) 
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) 
    at $Proxy1.addBlock(Unknown Source) 
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3510) 
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3373) 
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2589) 
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2829) 

====== ============这是分隔符===============================

这是我的java代码

Path output = new Path("testdir"); 
Configuration conf = new Configuration(); 
conf.set("fs.default.name", "hdfs://ec2-23-22-12-173.compute-1.amazonaws.com:9000"); 
conf.set("hadoop.job.user",ubuntu); 

FileSystem.mkdirs(FileSystem.get(conf), output, FsPermission.valueOf("drwxr-xr-x")); 
FileSystem fs = FileSystem.get(conf); 
fs.copyFromLocalFile(new Path("./myfile"), output); 

==================这是分隔符======================= ======== PS。我已经在安全组中打开了端口9000,50010,并且已经关闭了Linux防火墙。

任何人有任何想法?

谢谢。

+0

确实在HDSF中留有剩余空间。节点运行正常吗? – Tariq 2013-04-18 07:07:28

+0

本周我开始遇到同样的事情。它在一周前运行...你找到了解决方案吗? – 2013-04-20 07:20:22

此错误背后可能有以下几种原因: 1- DataNode未启动并正在运行。确保它不是这种情况。如果您没有收到任何内容,请尝试在每台服务器上挖掘DN日志。

2-运行DN的机器上的空间小于您通过“dfs.datanode.du.reserved”属性指定的空间。

3-您的DN机器上实际上没有剩余空间。

4- hdfs-site.xml文件中“dfs.data.dir”指定的路径没有剩余空间(可能磁盘作为dfs.data.dir用完了空间)。

5- DN不能将心跳/块报告发送到NN。确保没有与网络相关的问题。

HTH

+0

1,2,3,4已通过验证。它是一个干净的集群,'hadoop dfsadmin -report'命令报告所有节点都可用并且有足够的磁盘空间。我猜测5也可以工作,因为我可以从群集上的任何节点**写入HDFS。我唯一的问题是,当我尝试使用'--config'选项和集群配置从本地机器写入HDFS时。只有将内容上传到HDFS的操作失败,例如'-put',但像'-ls'这样的操作正在工作。 – 2013-04-22 06:19:39

+0

所有权和权限? – Tariq 2013-04-22 10:18:48

+0

我在HDFS上使用与用户相同的用户。从我在1.0.4中读到的默认权限是基于unix的,这意味着如果我尝试与具有与HDFS上用户名相同的用户连接,它应该可以工作。你可能会说我证实了这一点,因为当我尝试连接不同的用户时,它不起作用,甚至没有'-ls'。 – 2013-04-22 10:53:30

你有没有发现任何回答这个问题...如果不是,这里是潜在的“理性” ==>你的客户正试图从他们的私有IP地址访问EC2数据节点(这只对群集可见)而不是公共IP。你可以验证看看你的错误日志:不包括Datanode私有IP而不是公共IP,但我不知道我们应该如何克服这一点。我有同样的问题。欲了解更多信息,请查看此链接:http://www.hadoopinrealworld.com/could-only-be-replicated-to-0-nodes/