火花主控连接正在关闭一个请求待定从火花从属
我想设置一个2机器火花集群。 我在Windows主机上使用两个VirtualBox Ubutnu 16.04访客机器进行设置。 Master是Windows 10主机上的Ubuntu 16.04 Guest,而Slave是Windows 7 Host上的Ubuntu 16.04 Guest。火花主控连接正在关闭一个请求待定从火花从属
我也做了以下几件事:在两台机器上
- 设置密码的ssh少。
- 安装Java和星火两种机器上
- 在两台机器上安装路径变量
- 添加这两种机器的IP到/ etc /主机文件在两台机器上
- 添加SPARK_MASTER_IP =“master_ip”上从机的conf /火花-env.sh文件
现在,当我启动主人时,它启动正确。我可以在master_ip:8080上访问Spark Master的Web UI。
但是,当我试图通过使用sudo ./start-slave.sh master_ip:8080
发生以下情况,开始从子机的从节点:
奴隶工人开始,我可以访问它在slave_ip网页介面:8081,但是从工人无法连接到主服务器,并且未显示在Spark Master Web UI上,并且在工作日志文件中出现以下错误:
Slave Log 1我无法发布两个以上的链接,我无法发布完整的从服务器日志
使用nc -v ip port
命令成功
- 密码少ssh来从两台机器
- 平到两个机器的端口:
从登录
Spark Command: /usr/lib/jvm/java-8-oracle/jre//bin/java -cp /home/clusterslave/spark/spark-2.2.0-bin-hadoop2.7/conf/:/home/clusterslave/spark$ ======================================== Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 17/08/29 00:41:32 INFO Worker: Started daemon with process name: [email protected] 17/08/29 00:41:32 INFO SignalUtils: Registered signal handler for TERM 17/08/29 00:41:32 INFO SignalUtils: Registered signal handler for HUP 17/08/29 00:41:32 INFO SignalUtils: Registered signal handler for INT 17/08/29 00:41:32 WARN Utils: Your hostname, clusterslave-VirtualBox resolves to a loopback address: 127.0.0.1; using master_ip instead ($ 17/08/29 00:41:32 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 17/08/29 00:41:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/08/29 00:41:33 INFO SecurityManager: Changing view acls to: root 17/08/29 00:41:33 INFO SecurityManager: Changing modify acls to: root 17/08/29 00:41:33 INFO SecurityManager: Changing view acls groups to: 17/08/29 00:41:33 INFO SecurityManager: Changing modify acls groups to: 17/08/29 00:41:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); $ 17/08/29 00:41:34 INFO Utils: Successfully started service 'sparkWorker' on port 38929. 17/08/29 00:41:34 INFO Worker: Starting Spark worker slave_ip:38929 with 4 cores, 5.8 GB RAM 17/08/29 00:41:34 INFO Worker: Running Spark version 2.2.0 17/08/29 00:41:34 INFO Worker: Spark home: /home/clusterslave/spark/spark-2.2.0-bin-hadoop2.7 17/08/29 00:41:34 INFO Utils: Successfully started service 'WorkerUI' on port 8081. 17/08/29 00:41:34 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://slave_ip:8081 17/08/29 00:41:34 INFO Worker: Connecting to master master_ip:8080... 17/08/29 00:41:34 INFO TransportClientFactory: Successfully created connection to /master_ip:8080 after 105 ms (0 ms spent in bootstraps) 17/08/29 00:41:34 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from /master_ip:8080 is closed 17/08/29 00:41:34 WARN Worker: Failed to connect to master master_ip:8080 org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100) at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108) at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.s$ at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Connection from /master_ip:8080 closed at org.apache.spark.network.client.TransportResponseHandler.channelInactive(TransportResponseHandler.java:146) at org.apache.spark.network.server.TransportChannelHandler.channelInactive(TransportChannelHandler.java:108) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:220) at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:278) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:220) at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:220) at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) at org.apache.spark.network.util.TransportFrameDecoder.channelInactive(TransportFrameDecoder.java:182) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:220) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1289) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227) at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:893) at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:691) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) ... 1 more 17/08/29 00:41:42 INFO Worker: Retrying connection to master (attempt # 1)
下工作正常
- 由从到主:
Connecting to master_ip:8080 port [tcp/http-alt] succeeded!
- 由主到从:
Connecting to slave_ip:8081 port [tcp/tproxy] succeeded!
- 由从到主:
我也曾尝试同时在机器跌落防火墙。仍然问题仍然存在
请帮我解决这个问题。谢谢。
你应该允许两台机器之间的所有tcp网络,禁用防火墙,并尝试没有DNS名称只是直接给出IP地址,我会建议尝试在同一主机上的前两个虚拟机,排除网络问题,顺便说一句,为什么不你在码头上试试吗?
为您的使用情况下,按照此步骤:
在主 斌/火花级org.apache.spark.deploy.master.Master --ip 10.10.10.01 像链接https://github.com/2dmitrypavlov/sparkDocker/blob/master/master_ip.sh
然后在主设备上启动从设备,因为您要使用其资源,请确保在一个VM从设备上可以连接它。 斌/火花级org.apache.spark.deploy.worker.Worker火花://10.10.10.01:7077 --webui端口8081
后,它的工作原理做同样的第二VM:
bin/spark-class org.apache.spark.deploy.worker.Worker spark://10.10.10.01:7077 --webui-port 8081
在这个时候走一步,这样会更容易发现问题。 如果您决定使用码头工具,这里是带有说明的图片https://github.com/2dmitrypavlov/sparkDocker。
谢谢你的回应先生。如何允许两台机器之间的所有TCP网络? –
我试图在https://github.com/2dmitrypavlov/sparkDocker上使用docker镜像,但是在执行slave命令后,我没有看到任何从站连接到主站。 –
我试着用master_ip替换10.10.10.01之后提到的命令,我得到了相同的结果。同样的错误 –
您需要在spark-env.sh
文件中主机和从机的出口SPARK_MASTER_HOST=(master ip)
代替SPARK_MASTER_IP
,还出口SPARK_LOCAL_IP
两个
星火主控主机的Windows 10和Spark从主机的Windows 7虽然两者都是在客户操作系统中运行的Ubuntu 16.04在VirtualBox中 –
您应该在问题正文中发布相关日志文本。非现场链接将腐烂使这个问题对其他人不太有用。 – jdv
请不要使用评论更新问题。以这种方式编辑问题主体并添加其他信息。 – jdv