Hadoop报错Caused by: java.io.IOException: Stream closed问题
【一、问题现象】
这几天离线计算平台有个别计算作业运行失败,排查原因,查看平台日志,报错关键信息如下:
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Error: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. java.io.IOException: Stream closed
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257)
at org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:362)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.io.IOException: Stream closed
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2639)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:981)
at org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:2007)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:479)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:469)
at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:188)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:580)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:578)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.JobClient.getJobUsingCluster(JobClient.java:578)
at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:596)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:295)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:559)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:424)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1232)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:255)
... 11 more
Caused by: java.io.IOException: Stream closed
at java.util.zip.InflaterInputStream.ensureOpen(InflaterInputStream.java:67)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:142)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.read(Unknown Source)
at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
at org.apache.xerces.impl.XMLEntityScanner.scanContent(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2480)
at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2468)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2539)
... 37 more (state=08S01,code=1)
Closing: 0: jdbc:hive2://60.8.1.23:10000
【二、问题定位】
1、根据报错日志,指向为hadoop问题,于是在apache官网选择ALL ISSUES,搜索日志报错关键字:java.io.IOException: Stream closed,记录数较多,一 一比对,发现HADOOP-12404这个ISSUE与问题日志匹配
我们生产环境所用Hadoop版本为2.7,该ISSUE在Hadoop 2.8已被fixed,问题描述为:
从Configuration类中的URL加载资源时,请禁用JarURLConnection的缓存,以避免与其他用户共享JarFile。
Configuration类的parse方法将调用url.openStream来获取InputStream供DocumentBuilder进行解析。
根据JDK源代码,调用顺序为 url.openStream => handler.openConnection.getInputStream => new JarURLConnection => JarURLConnection.connect => factory.get(getJarFileURL(),getUseCaches())=> URLJarFile.getInputStream => JarFile.getInputStream => ZipFile.getInputStream
如果URLConnection类的getUseCaches方法返回值为true(默认情况下),则URLJarFile将为同一URL共享。 如果共享的URLJarFile被其他用户关闭,则URLJarFile类的getInputStream方法返回的所有InputStream都将基于文档关闭。
因此,在集群负载较高时,可能会发生该异常。
版本2.8对该问题进行了修复,设置用户缓存为false,同时修改了parse函数的返回参数。
diff --git hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java
index 0b45429..8801c6c 100644
--- hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java
+++ hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java
@@ -34,7 +34,9 @@
import java.io.Writer;
import java.lang.ref.WeakReference;
import java.net.InetSocketAddress;
+import java.net.JarURLConnection;
import java.net.URL;
+import java.net.URLConnection;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
@@ -2531,7 +2533,14 @@ private Document parse(DocumentBuilder builder, URL url)
if (url == null) {
return null;
}
- return parse(builder, url.openStream(), url.toString());
+
+ URLConnection connection = url.openConnection();
+ if (connection instanceof JarURLConnection) {
+ // Disable caching for JarURLConnection to avoid sharing JarFile
+ // with other users.
+ connection.setUseCaches(false);
+ }
+ return parse(builder, connection.getInputStream(), url.toString());
}
private Document parse(DocumentBuilder builder, InputStream is,
【三、问题解决】
方案一:直接升级hadoop 2.7到2.8
由于我们的hadoop上层支撑了10+个平台产品,如果升级hadoop 2.7,那么上层产品应用也要相应改造,改动太多,因此该方案现阶段尚不具备实施条件。
方案二:补丁修复
使用Arthas,查看hiveserver2进程调用哪个jar包
[[email protected] ~]$ unzip arthas-bin.zip
查看hiveserver2进程,pid为1244551,在Arthas启动选项中,输入对应的进程对应序号3
[[email protected] ~]$ ./as.sh
Arthas script version: 3.4.4
[INFO] JAVA_HOME: /export/server/jdk-1.8.0_211
Found existing java process, please choose one and input the serial number of the process, eg : 1. Then hit ENTER.
* [1]: 3970879 org.apache.spark.executor.CoarseGrainedExecutorBackend
[2]: 69477 org.apache.hadoop.hbase.regionserver.HRegionServer
[3]: 1244551 org.apache.hadoop.util.RunJar
[4]: 1884169 org.apache.hadoop.yarn.server.nodemanager.NodeManager
[5]: 3950379 org.apache.spark.executor.CoarseGrainedExecutorBackend
[6]: 3948779 org.apache.spark.executor.CoarseGrainedExecutorBackend
[7]: 3968885 org.apache.spark.executor.CoarseGrainedExecutorBackend
[8]: 547699 org.apache.spark.deploy.yarn.ExecutorLauncher
[9]: 62382 org.apache.hadoop.hdfs.server.datanode.DataNode
[10]: 1839602 org.apache.hadoop.util.RunJar
3
Arthas home: /home/hdfs
Calculating attach execution time...
Attaching to 1244551 using version /home/hdfs...
real 0m0.885s
user 0m0.522s
sys 0m0.126s
Attach success.
telnet connecting to arthas server... current timestamp is 1604659919
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
,---. ,------. ,--------.,--. ,--. ,---. ,---.
/ O \ | .--. ''--. .--'| '--' | / O \ ' .-'
| .-. || '--'.' | | | .--. || .-. |`. `-.
| | | || |\ \ | | | | | || | | |.-' |
`--' `--'`--' '--' `--' `--' `--'`--' `--'`-----'
wiki https://arthas.aliyun.com/doc
tutorials https://arthas.aliyun.com/doc/arthas-tutorials.html
version 3.4.4
pid 1244551
time 2020-11-06 18:51:59
查看classloader的继承树
[[email protected]]$ classloader -t
+-BootstrapClassLoader
+-sun.misc.Launcher$AppClassLoader@2f7c7260
Affect(row-cnt:44) cost in 44 ms.
通过类加载器的hashcode,查找类Configuration.class文件所在位置,进程调用的jar文件信息为 /export/server/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.jar
[[email protected]]$ classloader -c 2f7c7260 -r org/apache/hadoop/conf/Configuration.class
jar:file:/export/server/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.jar!/org/apache/hadoop/conf/Configuration.class
Affect(row-cnt:1) cost in 3 ms.
[[email protected]]$ exit
Connection closed by foreign host.
[[email protected] ~]$
【编译源文件】
从mvn center(https://search.maven.org/)中搜索org.apache.hadoop:hadoop-common:2.7.3,获取Configuration.java文件
在MyEclipse上新建项目,将Configuration.java文件导入,在pom.xml修改项目名称artifactId
根据官网公开的bug修复内容,对文件进行修改,
外网环境,windows CMD命令行切换到Maven项目的根目录,D:\Users\lenovo\Workspaces\MyEclipse 2017 CI\test,然后执行命令
mvn clean compile
根据报错内容,找到源文件对应位置,进行格式调整,多了个“+”号,修改后重新编译。
期间会下载一些依赖包,等待完成,直到“BUILD SUCCESS”
mvn clean package
对编译文件进行打包,等待完成,直到“BUILD SUCCESS”
将编译打包后的jar包赋权,并拷贝到hive的lib库下
[[email protected] lib]$ pwd
/export/hive/lib
[[email protected] bin]# chown hdfs.hadoop /home/hdfs/hadoop-common-my-0.0.1.jar
[[email protected] bin]# cp -p /home/hdfs/hadoop-common-my-0.0.1.jar /export/hive/lib/
[[email protected] bin]# su - hdfs
Last login: Fri Nov 6 16:32:42 CST 2020 on pts/0
[[email protected] ~]$ cd /export/hive/lib/
[[email protected] lib]$ ll hadoop-common-my-0.0.1.jar
-rw-r--r-- 1 hdfs hadoop 38689 Nov 6 17:36 hadoop-common-my-0.0.1.jar
将hadoop-common-my-0.0.1.jar文件放入/export/server/hive-2.3.2/lib目录即hive的库目录下。
重启hiveserver2,再次通过类加载器的hashcode,查找类Configuration.class的调用顺序,发现hadoop-common-my-0.0.1.jar已被优先调用
[[email protected]]$ classloader -c 6bdf28bb -r org/apache/hadoop/conf/Configuration.class
hadoop-common-my-0.0.1.jarjar:file:/export/server/hive-2.3.2/lib/hadoop-common-my-0.0.1.jar!/org/apache/hadoop/conf/Configuration.class
jar:file:/export/server/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.jar!/org/apache/hadoop/conf/Configuration.class
Affect(row-cnt:2) cost in 2 ms.
[[email protected]]$ classloader -c 6bdf28bb | grep hadoop-common
file:/export/server/hive-2.3.2/lib/hadoop-common-my-0.0.1.jar
file:/export/server/hive-2.3.2/lib/hadoop-common-my-0.0.1.jar
file:/export/server/hive-2.3.2/lib/hadoop-common-my-0.0.1.jar
file:/export/server/hive-2.3.2/lib/hadoop-common-my-0.0.1.jar
file:/export/server/hive-2.3.2/lib/hadoop-common-my-0.0.1.jar
file:/export/server/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3-tests.jar
file:/export/server/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.jar
通过sc可以查看已加载类的相关信息,比如该类是从哪个jar包加载的,被哪个类加载器加载的,以及是否是接口等等
[[email protected]]$ sc -d org.apache.hadoop.conf.Configuration
class-info org.apache.hadoop.conf.Configuration
code-source /export/server/hive-2.3.2/lib/hadoop-common-my-0.0.1.jar
name org.apache.hadoop.conf.Configuration
isInterface false
isAnnotation false
isEnum false
isAnonymousClass false
isArray false
isLocalClass false
isMemberClass false
isPrimitive false
isSynthetic false
simple-name Configuration
modifier public
annotation org.apache.hadoop.classification.InterfaceAudience$Public,org.apache.hadoop.classification.InterfaceStability$Stable
interfaces java.lang.Iterable,org.apache.hadoop.io.Writable
super-class +-java.lang.Object
class-loader [email protected]
classLoaderHash 6bdf28bb
class-info org.apache.hadoop.hdfs.HdfsConfiguration
code-source /export/server/hadoop-2.7.3/share/hadoop/hdfs/hadoop-hdfs-2.7.3.jar
name org.apache.hadoop.hdfs.HdfsConfiguration
isInterface false
isAnnotation false
isEnum false
isAnonymousClass false
isArray false
isLocalClass false
isMemberClass false
isPrimitive false
isSynthetic false
simple-name HdfsConfiguration
modifier public
annotation org.apache.hadoop.classification.InterfaceAudience$Private
interfaces
super-class +-org.apache.hadoop.conf.Configuration
+-java.lang.Object
class-loader [email protected]
classLoaderHash 6bdf28bb
class-info org.apache.hadoop.hive.conf.HiveConf
code-source /export/server/hive-2.3.2/lib/hive-common-2.3.2.jar
name org.apache.hadoop.hive.conf.HiveConf
isInterface false
isAnnotation false
isEnum false
isAnonymousClass false
isArray false
isLocalClass false
isMemberClass false
isPrimitive false
isSynthetic false
simple-name HiveConf
modifier public
annotation
interfaces
super-class +-org.apache.hadoop.conf.Configuration
+-java.lang.Object
class-loader [email protected]
classLoaderHash 6bdf28bb
class-info org.apache.hadoop.mapred.JobConf
code-source /export/server/apache-tez-0.9.1-bin/lib/hadoop-mapreduce-client-core-2.7.0.jar
name org.apache.hadoop.mapred.JobConf
isInterface false
isAnnotation false
isEnum false
isAnonymousClass false
isArray false
isLocalClass false
isMemberClass false
isPrimitive false
isSynthetic false
simple-name JobConf
modifier public
annotation org.apache.hadoop.classification.InterfaceAudience$Public,org.apache.hadoop.classification.InterfaceStability$Stable
interfaces
super-class +-org.apache.hadoop.conf.Configuration
+-java.lang.Object
class-loader [email protected]
classLoaderHash 6bdf28bb
class-info org.apache.hadoop.yarn.conf.YarnConfiguration
code-source /export/server/hadoop-2.7.3/share/hadoop/yarn/hadoop-yarn-api-2.7.3.jar
name org.apache.hadoop.yarn.conf.YarnConfiguration
isInterface false
isAnnotation false
isEnum false
isAnonymousClass false
isArray false
isLocalClass false
isMemberClass false
isPrimitive false
isSynthetic false
simple-name YarnConfiguration
modifier public
annotation org.apache.hadoop.classification.InterfaceAudience$Public,org.apache.hadoop.classification.InterfaceStability$Evolving
interfaces
super-class +-org.apache.hadoop.conf.Configuration
+-java.lang.Object
class-loader [email protected]
classLoaderHash 6bdf28bb
class-info org.apache.tez.dag.api.TezConfiguration
code-source /export/server/apache-tez-0.9.1-bin/tez-api-0.9.1.jar
name org.apache.tez.dag.api.TezConfiguration
isInterface false
isAnnotation false
isEnum false
isAnonymousClass false
isArray false
isLocalClass false
isMemberClass false
isPrimitive false
isSynthetic false
simple-name TezConfiguration
modifier public
annotation org.apache.hadoop.classification.InterfaceAudience$Public
interfaces
super-class +-org.apache.hadoop.conf.Configuration
+-java.lang.Object
class-loader [email protected]
classLoaderHash 6bdf28bb
Affect(row-cnt:6) cost in 99 ms.
反编译指定已加载类的源码,确定该hiveserver2进程调用的org.apache.hadoop.conf.Configuration类的源码是否为修改后的
[[email protected]]$ jad org.apache.hadoop.conf.Configuration
ClassLoader:
Location:
/export/server/hive-2.3.2/lib/hadoop-common-my-0.0.1.jar
/*
* Decompiled with CFR.
*/
package org.apache.hadoop.conf;
……
URLConnection connection = url.openConnection();
if (connection instanceof JarURLConnection) {
connection.setUseCaches(false);
}
return this.parse(builder, connection.getInputStream(), url.toString());
}
……
重启hiveserver2服务,执行一个mr查询任务,无报错OK。