Spark Build
原创转载请注明出处:http://agilestyle.iteye.com/blog/2337293
Prerequisite
硬件环境:Ubuntu16.04(8G内存)
软件环境:jdk1.7.0_80+scala-2.11.8+apache-maven-3.3.9
配置Linux下的环境变量(hadoop、hbase、hive以及zookeeper可以忽略)
vi .bashrc
# setup Java & Hadoop environment export JAVA_HOME=/home/spark/app/jdk1.7.0_80 export SCALA_HOME=/home/spark/app/scala-2.11.8 export MVN_HOME=/home/spark/app/apache-maven-3.3.9 export HADOOP_HOME=/home/spark/app/hadoop-2.6.0-cdh5.9.0 export HBASE_HOME=/home/spark/app/hbase-1.2.0-cdh5.9.0 export HIVE_HOME=/home/spark/app/hive-1.1.0-cdh5.9.0 export ZOOKEEPER_HOME=/home/spark/app/zookeeper-3.4.9 export PATH=$PATH:${JAVA_HOME}/bin:${MVN_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${HBASE_HOME}/bin:${HIVE_HOME}/bin:${ZOOKEEPER_HOME}/bin:${SCALA_HOME}/bin
配置Maven的镜像,这里推荐使用阿里云,这样在build的时候下载相关的dependency会快很多
<mirror> <id>nexus-aliyun</id> <mirrorOf>central</mirrorOf> <name>Nexus aliyun</name> <url>http://maven.aliyun.com/nexus/content/groups/public</url> </mirror>
配置Maven的一个Repository(不加的话,Spark Project External MQTT会build不过)
<repository> <id>mqtt-repo</id> <name>MQTT Repository</name> <url>https://repo.eclipse.org/content/repositories/paho-releases</url> <releases> <enabled>true</enabled> </releases> <snapshots> <enabled>false</enabled> </snapshots> </repository>
Build
Ubuntu需要安装R语言的相关依赖,否则build的时候会报这个错
“Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.4.0:exec (sparkr-pkg)”
sudo apt-get install r-base
Spark 1.x的版本是基于Scala 2.10的,自定义build改为Scala 2.11
./change-scala-version.sh 2.11
配置make-distribution.sh,注释掉130-142,添加143-146
#VERSION=$("$MVN" help:evaluate -Dexpression=project.version [email protected] 2>/dev/null | grep -v "INFO" | tail -n 1) #SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version [email protected] 2>/dev/null\ # | grep -v "INFO"\ # | tail -n 1) #SPARK_HADOOP_VERSION=$("$MVN" help:evaluate -Dexpression=hadoop.version [email protected] 2>/dev/null\ # | grep -v "INFO"\ # | tail -n 1) #SPARK_HIVE=$("$MVN" help:evaluate -Dexpression=project.activeProfiles -pl sql/hive [email protected] 2>/dev/null\ # | grep -v "INFO"\ # | fgrep --count "<id>hive</id>";\ # # Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\ # # because we use "set -o pipefail" # echo -n) VERSION=1.6.3 SCALA_VERSION=2.11.8 SPARK_HADOOP_VERSION=2.6.5 SPARK_HIVE=1
执行make-distribution.sh
./make-distribution.sh --name hadoop2.6.5 --tgz -Psparkr -Phadoop-2.6 -Dhadoop.version=2.6.5 -Dscala-2.11 -Phive -Phive-thriftserver -Pyarn
等大约半个小时吧(取决于各自的网速,我是在公司的内网上build的,没有墙,应该会快点)
成功之后会在Spark source package目录下生成一个tgz的文件
比如:spark-1.6.3-bin-hadoop2.6.5.tgz