Spark源码编译
很遗憾,如果按照官网第3个步骤那个链接去wget的话,会下载失败
曲线救国
[[email protected] src]$ wget https://archive.apache.org/dist/spark/spark-2.3.3/spark-2.3.3.tgz
[[email protected] src]$ wget https://archive.apache.org/dist/spark/spark-2.4.2/spark-2.4.2.tgz
[[email protected] src]$ tar -zxf spark-2.3.3.tgz -C ~/app/
[[email protected] src]$ tar -zxf spark-2.4.2.tgz -C ~/app/
编译Saprk源码前置条件
Maven 3.3.9 or newer
Java 8+
Scala
# 设置Spark编译的内存使用
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=2048m"
在编译Spark的时候,Hadoop的版本可以跟Yarn的版本不一致
./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean package
-Dhadoop.version=2.7.3指定Hadoop的小版本
mvn -Pyarn -Phvie -Phive-thriftserver -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.7.0 -DskipTests clean package
使用mvn安装这种方式是拿不到tar.gz这样的包
使用另外一种方式
./dev/make-distribution.sh --name 2.6.0-cdh5.7.0 --tgz -Dhadoop.version=2.6.0-cdh5.7.0 -Phadoop-2.6 -Phive -Phive-thriftserver -Pyarn
建议先把git安装一下
把Version那一部分注释掉
VERSION=2.3.3
SCALA_VERSION=2.11
SPARK_HADOOP_VERSION=2.6.0-cdh5.7.0
SPARK_HIVE=1
报错了,莫慌
[ERROR] Failed to execute goal on project spark-launcher_2.11: Could not resolve dependencies for project org.apache.spark:spark-launcher_2.11:jar:2.3.3: Could not find artifact org.apache.hadoop:hadoop-client:jar:2.6.0-cdh5.7.0 in nexus-aliyun (http://maven.aliyun.com/nexus/content/groups/public) -> [Help 1]
在pom.xml中添加cloudera repos
<repository>
<id>cloudera</id>
<name>cloudera Repository</name>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
mvn的项目构建是有源码跟资源文件
又报错了,莫慌
[error] Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000eec00000, 18350080, 0) failed; error='Cannot allocate memory' (errno=12)
解决方法
# 设置Spark编译的内存使用
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=2048m"
生产环境中,我们把Spark在本地编译好,然后传到服务器上面去,特别是Spark onYARN,解压开就好
Spark安装包目录结构说明
Spark examples的代码很重要,要看
jars这里1.0和2.0不一样,生产上有个最佳实践
yarn存放yarn相关的jar spark-2.2.0-yarn-shuffle.jar 动态资源调动会用到
编译成功!!!
[[email protected] spark-2.3.3]$ pwd
/home/hadoop/src/spark-2.3.3
[[email protected] spark-2.3.3]$ tar -zxf spark-2.3.3-bin-2.6.0-cdh5.7.0.tgz -C ~/app/
[[email protected] bin]$ ./spark-shell