从本地非Hadoop计算机将数据上传到在Amazon EC2中运行的HDFS
我在Amazon EC2上设置了两个节点的hadoop集群。它运作良好。我可以通过使用hadoop api(附带java程序)将数据从主节点或与hadoop集群相同的Amazon区域中的其他实例上载到HDFS。从本地非Hadoop计算机将数据上传到在Amazon EC2中运行的HDFS
然而,当我想从我的本地非Hadoop的机器做到这一点,它原来有如下例外:
我再登录到Hadoop的NameNode的检查与命令行。创建文件夹“testdir”,但上传文件“myfile”的大小为0.
==================这是分隔符==== ===========================
这些是例外
Apr 18, 2013 10:40:47 AM org.apache.hadoop.hdfs.DFSClient$DFSOutputStream createBlockOutputStream
INFO: Exception in createBlockOutputStream 10.196.153.215:50010 java.net.ConnectException: Connection timed out
Apr 18, 2013 10:40:47 AM org.apache.hadoop.hdfs.DFSClient$DFSOutputStream nextBlockOutputStream
INFO: Abandoning block blk_560654195674249927_1002
Apr 18, 2013 10:40:47 AM org.apache.hadoop.hdfs.DFSClient$DFSOutputStream nextBlockOutputStream
INFO: Excluding datanode 10.196.153.215:50010
Apr 18, 2013 10:41:09 AM org.apache.hadoop.hdfs.DFSClient$DFSOutputStream createBlockOutputStream
INFO: Exception in createBlockOutputStream 10.195.171.154:50010 java.net.ConnectException: Connection timed out
Apr 18, 2013 10:41:09 AM org.apache.hadoop.hdfs.DFSClient$DFSOutputStream nextBlockOutputStream
INFO: Abandoning block blk_1747509888999401559_1002
Apr 18, 2013 10:41:10 AM org.apache.hadoop.hdfs.DFSClient$DFSOutputStream nextBlockOutputStream
INFO: Excluding datanode 10.195.171.154:50010
Apr 18, 2013 10:41:10 AM org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer run
WARNING: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/ubuntu/testdir/myfile could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
at org.apache.hadoop.ipc.Client.call(Client.java:1070)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3510)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3373)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2589)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2829)
Apr 18, 2013 10:41:10 AM org.apache.hadoop.hdfs.DFSClient$DFSOutputStream processDatanodeError
WARNING: Error Recovery for block blk_1747509888999401559_1002 bad datanode[0] nodes == null
Apr 18, 2013 10:41:10 AM org.apache.hadoop.hdfs.DFSClient$DFSOutputStream processDatanodeError
WARNING: Could not get block locations. Source file "/user/ubuntu/testdir/myfile" - Aborting...
Exception in thread "main" org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/ubuntu/testdir/myfile could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
at org.apache.hadoop.ipc.Client.call(Client.java:1070)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3510)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3373)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClient.java:2589)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2829)
====== ============这是分隔符===============================
这是我的java代码
Path output = new Path("testdir");
Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://ec2-23-22-12-173.compute-1.amazonaws.com:9000");
conf.set("hadoop.job.user",ubuntu);
FileSystem.mkdirs(FileSystem.get(conf), output, FsPermission.valueOf("drwxr-xr-x"));
FileSystem fs = FileSystem.get(conf);
fs.copyFromLocalFile(new Path("./myfile"), output);
==================这是分隔符======================= ======== PS。我已经在安全组中打开了端口9000,50010,并且已经关闭了Linux防火墙。
任何人有任何想法?
谢谢。
此错误背后可能有以下几种原因: 1- DataNode未启动并正在运行。确保它不是这种情况。如果您没有收到任何内容,请尝试在每台服务器上挖掘DN日志。
2-运行DN的机器上的空间小于您通过“dfs.datanode.du.reserved”属性指定的空间。
3-您的DN机器上实际上没有剩余空间。
4- hdfs-site.xml文件中“dfs.data.dir”指定的路径没有剩余空间(可能磁盘作为dfs.data.dir用完了空间)。
5- DN不能将心跳/块报告发送到NN。确保没有与网络相关的问题。
HTH
1,2,3,4已通过验证。它是一个干净的集群,'hadoop dfsadmin -report'命令报告所有节点都可用并且有足够的磁盘空间。我猜测5也可以工作,因为我可以从群集上的任何节点**写入HDFS。我唯一的问题是,当我尝试使用'--config'选项和集群配置从本地机器写入HDFS时。只有将内容上传到HDFS的操作失败,例如'-put',但像'-ls'这样的操作正在工作。 – 2013-04-22 06:19:39
所有权和权限? – Tariq 2013-04-22 10:18:48
我在HDFS上使用与用户相同的用户。从我在1.0.4中读到的默认权限是基于unix的,这意味着如果我尝试与具有与HDFS上用户名相同的用户连接,它应该可以工作。你可能会说我证实了这一点,因为当我尝试连接不同的用户时,它不起作用,甚至没有'-ls'。 – 2013-04-22 10:53:30
你有没有发现任何回答这个问题...如果不是,这里是潜在的“理性” ==>你的客户正试图从他们的私有IP地址访问EC2数据节点(这只对群集可见)而不是公共IP。你可以验证看看你的错误日志:不包括Datanode私有IP而不是公共IP,但我不知道我们应该如何克服这一点。我有同样的问题。欲了解更多信息,请查看此链接:http://www.hadoopinrealworld.com/could-only-be-replicated-to-0-nodes/
确实在HDSF中留有剩余空间。节点运行正常吗? – Tariq 2013-04-18 07:07:28
本周我开始遇到同样的事情。它在一周前运行...你找到了解决方案吗? – 2013-04-20 07:20:22